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O . Abstract 

<D 

A family of equivalence tools for bounding network capacities is introduced. Part I treats networks 

m ; 

built from point-to-point channels. Part II generalizes the technique to networks containing wireless 
channels such as broadcast, multiple access, and interference channels. The main result of part I is roughly 
as follows. Given a network of noisy, independent, memoryless point-to-point channels, a collection of 
demands can be met on the given network if and only if it can be met on another network where each 
noisy channel is replaced by a noiseless bit pipe with throughput equal to the noisy channel capacity. 
This result was known previously for the case of a single-source multicast demand. The result given 



here treats general demands - including, for example, multiple unicast demands - and applies even when 
the achievable rate region for the corresponding demands is unknown in both the noisy network and its 
noiseless counterpart. 
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I. Introduction 

The study of network communications has two natural facets reflecting different approaches to thinking 
about networks. On the one hand, networks are considered in the graph theoretic setup consisting of 
nodes connected by links. The links are typically not noisy channels but noise-free bit pipes that can be 
used error free up to a certain capacity. Typical concepts include information flows and routing issues. On 
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the other hand, multiterminal information theory addresses information transmission through networks by 
studying noisy channels, or rather the stochastic relationship between input and output signals at devices 
in a network. Here the questions typically concern fundamental limits of communication. The capacity 
regions of broadcast, multiple access, and interference channels are all examples of questions that are 
addressed in the context of multiterminal information theory. These questions appear to have no obvious 
equivalent in networks consisting of error free bit pipes. Nevertheless, these two views of networking are 
two natural facets of the same problem, namely communication through networks. This paper explores 
the relationship between these two worlds. 

Establishing viable bridges between these two areas shows to be surprisingly fertile. For example, 
questions about feedback in multiterminal systems are quite nicely expressed in networks of error free 
bit-pipes. Separation issues — in particular separation between network coding and channel coding - 
have natural answers, revealing many network capacity problems as combinatorial rather than statistical, 
even when communication occurs across networks of noisy channels. Most importantly, bounding general 
network capacities reduces to solving a central network coding problem described as follows: Given a 
network of error free rate-constrained bit pipes, is a given set of demands (e.g., a collection of unicast 
and multicast connections) simultaneously satisflable or not. In certain situations, most notably a single 
multicast demand, this question is solved, and the answer is easily characterized [1]. Unfortunately, the 
general case is wide open and suspected to be hard. (Currently, NP hardness is only established for 
linear network coding [2].) While it appears that fully characterizing the combinatorial network coding 
problem is out of reach [3], moderate size networks can be solved quite efficiently, and there are algorithms 
available that, with running time that is exponential in the number of nodes, treat precisely this problem 
for general demands (H, J5], JH. The possibility of characterizing, in principle, the rate region of a 
combinatorial network coding problem will be a corner stone for our investigations. 

The combinatorial nature of the network coding problem creates a situation not unlike issues in complexity 
theory. In that case, since precise expressions as to how difficult a problem is in absolute terms are difficult 
to derive, research is instead devoted to showing that one problem is essentially as difficult as another 
one (even though precise characterizations are not available for either). Inspired by this analogy, we here 
take a similar approach, characterizing the relationship between arbitrary network capacity problems and 
the central combinatorial network coding capacity problem. This characterization is, in fact, all we need 
if we want to address separation issues in networks. It also opens the door to other questions, such as 
degree-of-freedom or high signal to noise ratio analyses, which reveal interesting insights. 



It is interesting to note the variety of new tools generated in recent years for studying network capacities 
(e.g., ID, 0, H, 0, 0, OH, O, 0, El, 0)- The reduction of a network information theoretic 
question to its combinatorial essence is also at the heart of some of these publications (see, e.g. CGI). Our 
approach is very different in terms of technique and also results, focusing not on the solution of network 
capacities when good outer bounds are available but on proving relationships between capacity regions 
even (or especially) when these capacity regions remain inaccessible using available analytical techniques. 
Nonetheless, we believe it to be no coincidence that the reduction of a problem to its combinatorial essence 
plays a central role in a variety of techniques for studying network capacities. 

II. Intuition and Summary of Results 

The goal of finding capacities for general networks under general demands is currently out of reach. 
Establishing connections between the networking and information theoretic views of network communi- 
cations simplifies the task by allowing us to identify both the stochastic and the combinatorial facets of the 
communication problem and to apply the appropriate tools to each. For example, consider a network of 
independent, memoryless, noisy point-to-point channels. To derive the multicast capacity of this network, 
Borade and Song, Yeung, and Cai [13] first find the noisy network's cut-set outer bound and then 
demonstrate the achievability of that bound by applying a multicast network code over point-to-point 
channels made reliable using independent channel coding on each point-to-point channel. The resulting 
separation theorem establishes one tight connection between the two natural views of communication 
networks. This paper considers whether similar connections can be established for general demands. 
Relating the capacity of stochastic networks to the network coding capacity allows us to apply analytical 
and computational tools from the network coding literature (e.g., 0, 0, 0) to bound the capacity of 
networks of stochastic channels. 

While it is tempting to believe that the separation result derived for a single-source multicast demand 
in Q, |fT3l should also apply under general demands, it is clear that the proof technique does not. That is, 
first establishing a tight outer bound and then showing that that outer bound can be achieved by separate 
network and channel coding is not a feasible strategy for treating all possible demand types over all 
possible network topologies. The proof is further complicated by the observation that joint channel and 
network codes have a variety of clear advantages over separated codes even when separated strategies 
suffice to achieve the network capacity. Example [TJ illustrates one such advantage, showing that operating 
channels above their respective capacities can improve communication reliability across the network as 
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Fig. 1. (a) The network discussed in the comparison of separate network and channel coding to joint network and channel 
coding in Example Q] (b) A pair of (2 nR ,n) channel codes; each is used to reliably transmit nR bits over n uses of a single 
channel in the separated strategy, (c) A single (2 1 - "' , 2n) channel code; this is used to reliably transmit information across n 
uses of the pair of channels in the joint coding strategy. The joint coding strategy achieves twice the error exponent by operating 
each channel at roughly twice its capacity. 



a whole. It remains to be determined whether operating channels above their capacities can also increase 
the achievable rate region for cases beyond the single-source multicast demand studied in Q, lfT3l . 

Example 1 Consider the problem of establishing a unicast connection over the two-node network shown 



in Figure (Ha). Node 1 transmits a pair of channel inputs 



,(i) 



.(1,1)^(1,2)^ ]\f oc i e 2 receives a pair 



of channel outputs y^ 2 > = (j/ 2,1 ),?/ 2,2 )). The inputs and outputs are stochastically related through a pair 
of independent but identical channels, thus 

p{y^ , y( 2 > 2 ) l^ 1 ' 1 ) , x^ ) = P {y^ l) \x™ )p{y {2 ' 2) |x (1 ' 2) ) 

forall^ 1 ' 1 ),^ 1 ' 2 ),^ 2 ' 1 ),^ 2 ' 2 ^ 

for all (a^ 1 ' 1 ',^ 2 ' 1 ') = (x( 1,2 \y( 2 ' 2 '). For each rate R < C = max. p i x \ I(X;Y) and each blocklength n, 
we compare two strategies for reliably communicating from node 1 to node 2. The first (see Figure (Hb)) 
is an optimal separate network and channel code that reliably communicates across each channel using an 
optimal (2 nR ,n) channel code. The second strategy (see Figure [He)) applies a single optimal (2 2nR ,2n) 
channel code across the pair of channels, sending the first n symbols of each codeword across the first 
channel and the remaining n symbols across the second channel. The decoder observes the outputs of 
both channels and reliably decodes using its blocklength-2?7, channel decoder. Using this approach, each 
channel has 2 2nR possible inputs. Thus when R is close to C, this joint channel and network code 
operates each channel at roughly twice its capacity - making reliable transmission across each channel 
alone impossible. Since the joint code operates a 2n dimensional code over n time steps, it achieves a 
better error exponent than the separated code. ■ 



Our main result is roughly as follows. An arbitrary collection of demands can be met on a network of 
noisy, independent, memoryless point-to-point channels, if and only if the same demands can be met 
on another network where each noisy channel is replaced by a noiseless bit pipe of the corresponding 
capacity. This result agrees with Q, lfT3l in the case of multicast demands. 

Our proof introduces a new technique for bounding the capacity region of one network in terms of the 
capacity region of another network. Critically, this approach can be employed even when the capacity 
regions of both networks are unknown. We prove equivalence by first showing that the rate region for 
the noiseless bit-pipe network is a subset of that for the network of noisy channels and then showing 
that the rate region for the network of noiseless bit pipes is a superset of that for the network of noisy 
channels. In each case, we show the desired relationship by demonstrating that codes that can be operated 
reliably on one network can be operated with similar error probability on the other network. Codes 
for the bit-pipe network can be operated across the network of noisy channels using an independent 
channel code across each channel. Operating codes for the network of noisy channels across the bit- 
pipe network is more difficult since networks of noisy channels allow a far richer algorithmic behavior 
than networks of noiseless bit pipes. While it is known that a noiseless bit-pipe of a given throughput 
can emulate any discrete memoryless channel of lesser capacity lfl4l . applying this result seems to be 
difficult. Difficulties arise with continuous random variables, timing questions, and proving continuity of 
rate regions in the channel statistics. Worst of all, since we do not know which strategy achieves the 
network capacity, we must be able to emulate all of them. We therefore prove our main claim directly, 
without exploiting [14]. We use a source coding argument to show that we can emulate each noisy channel 
across the corresponding noiseless bit pipe to sufficient accuracy that any code designed for the network 
of noisy channels can be operated across the noiseless bit-pipe network with a similar error probability. 
It is important to note that the given approach does not require knowing the rate region of either network 
nor what the optimal codes look like, and it never answers the question of whether a particular rate point 
is in the rate region or not. The proofs only demonstrate that any rate point in the interior of the rate 
region for one network must also be in the interior of the rate region for the other network. 

The given relationship between networks of noisy point-to-point channels and networks of noiseless bit- 
pipes has a number of surprisingly powerful consequences. For example, it demonstrates that at its core 
characterizing network capacity is a combinatoric problem rather than a probabilistic one: Shannon's 
channel coding theorem tells us everything that we need to know about the noise in independent, point- 
to-point channels. Understanding the relationship between the two facets of network communications 



likewise lends insight into a variety of network information theoretic questions. For example, the classical 
result that feedback does not increase the capacity of a point-to-point channel now can be proven in 
two ways. The first is the classical information theoretic argument that shows that the channel has no 
information that is useful to the transmitter that the transmitter does not already know. The second 
observes that the min-cut between the transmitter and the receiver in the equivalent network is the 
same with or without feedback; therefore feedback does not increase capacity. While both proofs lead 
to the same well-known result, the latter is easier to generalize. For example, the following result is 
an immediate consequence of the given network equivalence and the well-known characterization of 
the multicast capacity in network coding 12. Given any network of noisy, memoryless, point-to-point 
channels and any multicast demand, feedback increases the multicast capacity if and only if it increases 
the min-cut on the equivalent deterministic network. Likewise, since capacities are known for a variety of 
network coding problems JS), we can immediately determine whether feedback increases the achievable 
rate regions for a variety of other demand types (e.g., multiple-source multicast demands, single-source 
non-overlapping demands, and single-source non-overlapping plus multicast demands). 

III. The Setup 

Our notation is similar to that of Cover and Thomas lfl"5l Section 15.10]. A multiterminal network is 
defined by a vertex set V = {l,...,m} with associated random variables 1^ G X^ v > which are 
transmitted from node v and Y^ v > G y^ v > which are received at node v. The alphabets X^"> and 3^ 
may be discrete or continuous. They may also be vectors or scalars. For example, if node v transmits 
information over k binary symmetric channels, then X( v > = {0, l} k . The network is assumed to be 
memoryless and characterized by a conditional probability distribution 

P (y\ X )=p(y^,...,y^\xW,...,x^). 

Note that for continuous random variables this assumption implies that we restrict our attention to cases 
where this conditional distribution (in this case a conditional prbability density function) exists. A code of 
blocklength n operates the network over n time steps with the goal of communicating, for each distinct 
pair of nodes u and v, message 

pp-(Ti->«) e yy (u-h>) def/j ^ 2 nR(u ^ nj) \ 

from source node u to sink node v. The messages W^ u ^ v > are independent and uniformly distributed by 
assumption (the proof also goes through unchanged if the same message is available at more than one 




Fig. 2. An m-node network containing a channel p(y^' 1 '\x^' 1 ') from node i to node j. Here x^ 1 ' = (a/ , a. a '), y^> = 

0-1)^0,2)^0+1)^ _ ., tf (™)|a.(i), . . ..arC*- 1 ),^' 3 ),^ 1 ),. . . ,z (m) ) on the 



(y , y^' 2 '), and the distribution p(y^' , . . . , y 

remaining channel outputs given the remaining channel inputs is arbitrary. 



node in the network). The vector of messages W( u_>u ) is denoted by W . The constant R( u ^ v ) is called 
the rate of the transmission, and the vector of rates R( u ~ ¥v '> is denoted by 1Z. Since no message is required 
from a node u to itself, BS U ^ U > = 0, and V, is treated as a m{m — 1) -dimensional vector. By lfT6l . for 
any network coding problem with generic demands, we can construct a multiple unicast problem such 
that the given demands can be met in the original network if and only if the unicast demands can be 
met in the constructed network. This argument generalizes immediately to the network model presented 
here. Therefore, there is no loss of generality (and considerable simplification of notation) in describing 
messages for all node pairs ((«, v) € {1, . . . , m} 2 such that u/«) rather than messages for all possible 
multicasts ((it, B) with it G {1, ... , m} and BC{1 m} \ it). 

(v) 

We denote the random variable transmitted by node v at time t as X t and the full vector of time-t 
transmissions by all nodes as X 4 . We likewise denote the random variable received by node v at time t 

(v) 

by Y t y and the full vector of time-t channel outputs by Y t . A network is written as a triple 



n^) )P (y|x),n^ 
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with the additional constraint that random variable X t is a function of random variables 

{Y{ v \ . . . , Y^lw^Q, ..., V^™)} 
alone. 



While this characterization is very general, it does not exploit any information about the network's 
structure. Later discussion treats networks made entirely from point-to-point channels, but we begin by 
considering a network that is completely arbitrary except for its inclusion of a point-to-point channel from 



node i to node j, as shown in Figured that is independent of the remainder of the network distribution. 
Precisely, the conditional distribution on all channel outputs given all channel inputs factors as 

p(y|x) 

where A?W = X^ x X^ 2 \ 3#) = ^ (ill) x y {j > 2 \ (X<- i > 1 \p(yV>V\x( i > 1) ),yV> r >) is the point-to-point 
channel, and 

(X^ xJIxM, p(y^\..,y^- 1 \y^ 2 \y^ +l \..,y^ m ^x( l \..,x^- 1 \x^ 2 \x^+ 1 \..,x^), 

is the remainder of the network. As mentioned previously, for continuous-alphabet channels we restrict 
our attention to networks for which the above conditional probability density functions exist. We also 
restrict our attention to channels for which the input distribution that achieves the capacity of channel 
(X^' 1 ' ,p(yV' 1 '\x( t,:i ')),yv ,1 >) in isolation has a probability density function. This includes most of the 
continuous channels studied in the literature. 

The notation X = (X (i,1) ,X~ (i,1) ) and Y = (yO'.^y-O'. 1 )) is sometimes useful to succinctly distin- 
guish the input and output of the point-to-point channel from the remainder of the network channel inputs 
and outputs. Using this notation, an m-node network containing an independent point-to-point channel 
from node i to node j is written as 

M = (x-^V x X( i ' 1 \p{y^ 1 )\x^)p{y-^\x-^),y- { j^ x 3 ;(i ' 1) ) . (2) 

Figure [TJa) shows one example where the remainder of the network is itself a point-to-point channel. In 
this paper we want to investigate some information theoretic aspects of replacing factor p(y^' 1 ^\x ( - 1 ' 1 ^). 

Remark 1 The given definitions are sufficiently general to model a wide variety of memoryless channel 
types. For example, the distribution p(y'^ ,1 >\x~^ 1 ' 1 ') may model wireless components like broadcast, 
multiple access, and interference channels. If X^' 1 ^ and yd' 1 ) are vector alphabets, then the channel 
from node i to node j is a point-to-point MIMO channel. In some situations it is important to be able to 
embed the transmissions of various nodes in a schedule which may or may not depend on the messages to 
be sent and the symbols that were received in the network. It is straightforward to model such a situation 
in the above setup by including in the input and output alphabets symbols for the case when nothing was 

(v) 

sent on a particular node input. In this way we can assume that at each time t random variables X t 
and Y t are given. 



Definition 1 Let a network 

(m m \ 

u=l v=l J 

be given. A blocklength-n solution <S(jV) to this network is defined as a set of encoding and decoding 
functions: 

m 



Xf } : (^("))*- 1 x [J W ( ^' } -»• tf^ 



t)'=i 



^(•u^d) . (^(«))^ x T7 yy^ - ^') ->■ w ( "~ 



>,;) 



t)'=l 



mapping (y} v) ,..., y£\, W^ - * 1 ), . . . , wC"""")) to X t (u) for each v € V andt € {1, . . . , ra} and mapping 
(Y} v \. . . , Y^ v \ W( v ^\ . . . , W^-"")) to WK"""') for each u, i> G V. The soiution 5 (A/") is caiied a 
(\,K)-solution, denoted (\,K)-S(N), if Pr(W( u_M ') / W"(«-"0) < A for afl source and sink pairs it, u 
using the specified encoding and decoding functions. 

Definition 2 The rate region &(N) C IR™ of a network M is the closure of all rate vectors 1Z such 

that for any A > and aii n sufficiently large, there exists a (A, 72.) -5 (M) solution of blocklength n. We use 
int(M{N)) to denote the interior of rate region M{M). 

The goal of this paper is not to give the capacity regions of networks with respect to various demands, 
which is an intractable problem. Rather, we wish to develop equivalence relationships between capacity 
regions of networks. Given the existence of a solution (A, TZ)-S(M) of some blocklength n for a network 
N we will try to imply statements for the existence of a solution (A' ,1Z')-S(j\f') of some blocklength 
n' for a network A/ 7 . 

To make this precise, consider a memoryless network J\f containing an independent channel from node 
i to node j. Then 

p( y |x) = piyMlx^piy-Wlx-W). 

Let another network M' be given with random variables (.X^ 1 ), Y^' 1 )) replacing (.X"^' 1 ), yO' 1 )) in M. 
We have replaced the point-to-point channel characterized by piy^' 1 '^ 1 ' 1 ') with another point-to-point 
channel characterized by pffi'^xP* 1 )). When I(xW;YW) < /(XW'jy^ 1 )), we want to prove 
that the existence of a (\,1Z)-S(j\f) solution implies the existence of a (A' ,1Z')-S(N') solution, where 
A' can be made arbitrarily small if A can. Since node j need not decode Y^' 1 ', channel capacity is not 
necessarily a relevant characterization of the channel's behavior. For example a Gaussian channel from 
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Fig. 3. The 3-fold stacked network JV for the network Af in Figure [TJ a). 

i to j might contribute a real- valued estimation of the input random variable; a binary erasure channel 
that replaces it cannot immediately deliver the same functionality. 

Our proof does not invent a coding scheme. Instead, we demonstrate a technique for operating any coding 
scheme for Af on the network J\f' . Since there exists a coding scheme for J\f that achieves any point in the 
interior of ffl(N), showing that we can operate all codes for J\f on A/ 7 proves that M{M) C 38{N'). We 
do not know the form of an optimal code for J\f. Therefore, our method must work for all possible codes 
on J\f. For example, it must succeed even when the code for J\f is time-varying. As a result, we cannot 
apply typicality arguments across time. We introduce instead a notion of stacking in order to exploit 
averaging arguments across multiple uses of the network rather than trying to apply such arguments 
across time. 

IV. Stacked Networks and Stacked Solutions 

An A^-fold stacked network Mm is the network M repeated N times. That is, M_ N has N copies of 
each vertex v G {1,. . . , m} and N copies of the channel p(y|x). Figure [3] shows the 3-fold stacked 
network for the network in Figure [Q We abuse notation by simplifying J\f N to M_ throughout, specifying 
the number layers in the stack (N) by context. Eventually, N will be allowed to grow without bound 
in order to exploit asymptotic typicality arguments. The V-fold stacked network is used to deliver N 
independent messages W^ u ^ v ^ from each transmitter node u to each receiver node v. All copies of a 

(v) 

node can, at each time t, collaborate in determining their channel inputs XI . Likewise, all copies of a 
node v can collaborate in reconstructing messages W^ u ~^ v \ This potential for collaboration across the 
layers of the stack seems to make the A r -fold stacked network N_ considerably more powerful than the 
network J\f from which it was derived. However, the increase in the number of degrees of freedom in 
a stacked network solution is accompanied by an increased burden in the reconstruction constraint. A 
code for the stacked network is successful only if it decodes without error in every layer. This becomes 
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difficult as N grows without bound. 

Since the iV-fold stacked network contains N copies of J\f, it does not meet the definition of a network 
(for example, its vertex set is a multiset and not a se\j). Thus new definitions are required. We carry 
over notation and variable definitions from the network J\f to the stacked network M_ by underlining 
the variable names. So for any distinct u,v G {l,...,m}, W_ {u ^ v) G n^-^l^nX"-^))^ is the 
iV-dimensional vector of messages that the N copies of node u send to the corresponding copies of 
node v, and X** e X^ d =(X^) N and F t (v) G y( v ) d =(y( v )) N are the iV-dimensional vectors of 
channel inputs and channel outputs, respectively, for node v at time t. The variables in the ^-th layer 
of the stack are denoted by an argument £, for example W} u ^ v '(£) is the message from node u to 
node v in the £-th layer of the stack and X_\ v (£) is the layer-^ 1 channel input from node v at time t. 
Since W {u ^ v) is an iV-dimensional vector of messages, when W(*-*0 G yy("->«) d = {1, . . . , 2 nR< ^ } } 
in M, W< u ^ v) G Y/ U ^ v) d = {l,...,^"^}^ in AA. We therefore define the rate R( u ^ v ) for a 
stacked network to be (log |y\A"~ H ^|)/(niV); this normalization makes rate regions in a network and its 
corresponding stacked network comparable. 

Definition 3 Let a network 

(m in \ 

be given. LetJ\[ be the N-fold stacked network forM. A blocklength-n solution S(J\f) to this network is 
defined as a set of encoding and decoding functions 

v'=l 
n 

'"""'' • (v( v h n v TT yu(«-^) -a w(«^) 

mapping (Y^ , . . . , F^ , W^^ 1 ) , . . . , W_(^™)) to X^ ] for each t G {1, . . . ,n} and v G {1, . . . , m} 
andmapping (Y^\ . . . , Y^ ,W^- >1 \ . . . , W {v ^ m) ) tow} U ^ v) for each u,v G {1, . . . , m}. The so7ution 
S(J\T) is called a (X,TZ) -solution for M_, denoted (A, TZ)-S(J\f). if the encoding and decoding functions 
imply 

Pr(fM/f M )<A 

for all source and sink pairs u, v. 

'The vertex set is a multiset since it contains N copies of each element {1, . . . , m}. 
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X&) , Q;(«))* l x rj w(v->v>) ^ x(v) 

v'=l 
m 



Definition 4 The rate region M(M) C M™ of a stacked network M is the closure of all rate vectors 

1Z such that a (A, TZ)-S(AT) solution exists for any A > and all N sufficiently large. 

Theorem [Q below, shows that the rate regions for a network J\f and its corresponding stacked network 
J\f are identical. That result further demonstrates that the error probability for the stacked network can be 
made to decay exponentially in the number of layers N. The proof builds a blocklength-n solution for 
network N_ by first using a channel code to map each message W_( u ~~* v ) £ y^i u ~^ v ) = {1, . . . , 2 nR(u " } N 
to a message in alphabet VV = {1, . . . , 2 nR } N for some R( u ^ v ) > r( u ^ v ) and then applying 

the same blocklength-n solution for network M independently in each layer of the stack. We call such 
a solution a stacked solution. 

Definition 5 Let a network M d = {]\™ =l X {v) ,p(y\x),Il™ =1 3^) begiven. LetM_ be the N -fold stacked 
network forJ\f. A blocklength-n stacked solution S_(M_) to network J\f is defined as a set of mappings 

m 

x (v) . (yW)t-i x Y[ W^"') -► X® 

v'=l 
m 



jy-O-w) . (y( v )) n x ]T vy^-*'') -> w^^ 1 ') 



«'=i 



such that 



xf > (£) = x[ v) (yW(£), . . . , y>\ (*), w^ (*), . . . ,H^ m) (£)) 

for each u,v £ {1, . . . , m}, t € {1, . . . , n}, and £ € {1, . . . , N}. The solution S(M) is called a stacked 
(A, 1Z)-solution, denoted (A, H)-S(J\f), if the specified mappings imply 

for all source and sink pairs u,v. 

Theorem 1 The rate regions M{M) and M(J\T) are identical, and for each 1Z G int{M{NY), there exists a 
sequence of (2~ NS ,7Z)-S_(J\T) stacked solutions forM for some 5 > 0. 
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Fig. 4. A blocklength-n solution S (AT) for network AT can be operated with the same error probability over nJV time 
steps in A/, (a) Inputs and outputs at time t of the N copies of node v in M_. (b) Inputs and outputs of node v at times 



(«)\T 



rM 



r(v)\T 



(N-l)t + l,...,Nt in AT, Vectors (X™ 1)N+1 , ..., X^) 1 and (Y^ 1)N+1 , ..., Y^f in M play 



the same role as vectors 



Ar j and yW in A£. 



Proof. We first show that & (AT) C M{M). Perhaps surprisingly, this turns out to be the easier part of 
the proof . LetTZ € int(&(N)). Then for any A 6 (0, 1], there exists a (A,7£)-5(AQ solution to the stacked 
network A/\ Let n be tiie blocklength ofS (J\f ) . The argument that follows uses S (AT) to build a blocklength 
nN (A, 1Z)-S(N) solution for network N '. Roughly, the operations performed at time t by the N copies of 
node v in S(J\f) are performed by the single copy of node v at times (t — l)N + 1, . . . , tN in S(J\f), as 
shown in Figured This gives the desired result since the error probability and rate ofS(J\f) on M equal the 
error probability and rate ofS(AT) on Af_. 

To make the argument formal, for each (u, v), let 

f(u-±v) . ri rjAfni? 1 ""*" 1 ! . r-i nni?*"^"' \ N 

be the natural one-to-one mapping from a single sequence ofNnR^ u ~^ v ^ bits to N consecutive subsequences 
each of nR( u ^ v ^ bits. Let g( u ~~* v ) be the inverse of f( u ^ v ). We use f( u ^ v ) to map messages from the 
message alphabet of the rate-RS u ~* v > blocklength-Nn code S(M) to the message alphabet for the N -layer, 
rate-R- u ^ v ' , blocklength-n code S(J\f) . The mapping is one-to-one since in each scenario the total number 
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of bits transmitted from node u to node v is NnR( u ^ v \ For each t & {1 . . . , n}, let 

y(«)w _ (v( v ) Y^"h T 

AW W - l A (t-l)Af+l'---' A wi 

J W _ \ x (t-l)N+V • • ■> X tN ) 

denote vectors containing the channel inputs and outputs at node v for N consecutive time steps beginning 
at time (t — l)N + 1. This is a simple blocking of symbols into vectors, with superscript T denoting vector 
transpose. We define the solution S(J\f) as 

lW(t) = 2£i v) {Y (v \l), . . .,Y {v \t - 1), f( v -> 1 )(w( v -> 1 '*), ..., /(«-"") (wC"-"*))) 

y^{u~^v) _ gM/^M^W/n _ _ _ ,y( v "> (n), f( v ~*^ (W^ 1 ^), . . . f( v -* m )(w( v ~* m h)). 

Since S(M) satisfies the causality constraints and operates precisely the mappings from S(M_) on J\f, the 
solution S(N) achieves the same rate and error probability on N as the solution S(N} achieves on N . 

For the converse, the job is more difficult. A solution (X,lZ)-S(J\f) needs to achieve an error probability 
of at most X for every (u,v) pair in a network. A solution (X,1Z)-S(N) also needs to achieve an error 
probability of at most X for each (u, v), but here the error event is a union over errors in each of the N 
layers with N growing arbitrarily large. 

Let K G int{M(M)), and fix some H G int(@(N)) for which R( u ^ v ) > R( u ^ v ) for all u, v. We use a 
solution of rate TZonMto build a stacked solution of rate 1Z on M_. Set p = min UtV (R^ u ^ v ^ — i?( u_H; )). 
For any p e [0, 1], let h{p) = — plogp — (1 — p) log(l — p) be the binary entropy function. For reasons 
that will become clear later, we wish to find constants X and n satisfying 

maxR( u -> v h + h(X)/n<p. 

u,v 

such that there exists a (X,1Z)-S(N) solution of blocklength n. This is possible becauseTZ G int(£%(N)) 
implies that for any X G (0, 1] and all n sufficiently large there exists a blocklength-n (X,1Z)-S(N) solution. 
We therefore meet the desired constraint by choosing X to be small (say X = p/(2 maxy R( u ^ v >)) and then 
choosing n sufficiently large. The chosen n will be the blocklength of our code for all values of N. 

Fix a (X,1Z)-S(N) solution of blocklength n. For the (X,1Z)-S(N) solution, denote the message set 
by >V(«-H d = {1, . . . , 2 nRiu " v> }, and let W (u ^ and W {u ^ be the message and its reconstruction, 
respectively, using the fixed (A, TZ)-S(Af) solution. We use S(J\f) as the solution applied independently in 
each layer of our stacked solution. 
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While solution S(J\f) yields error probability no greater than X in each layer of the stack, the error prob- 
ability over all N layers may still be high. The stacked solution's channel codes are included to remedy 
this problem. For each (u,v), the layers of the stack behave like N independent instances of channel 

(yv(u->v)^ p ^(u->v)^(u^v)^yy(u->v)^ w here p(ib (u ^\w (u ^) = Vv{W = ^W^^ = w^^) 
under solution S(M). By assumption, W^ u ~* v ^ is uniformly distributed on yy , ( u ~* v ), so this channel has 
mutual information 

> nR {u ~> v) - (\nR( u -* v) + h{X)) 

by Fano's inequality. Note that the desired rate per channel use nR^ u ^ v > is strictly less than the channel's 
mutual information, precisely 

I(W {u ^ v) ;W iu ^ v) ) - nR {u ~* v) > np - (XnR {u ^ v) + h{\)) > 0, 

owing to our earlier choice ofX and n. We therefore design a (2 N ^ nR " " ) , N) channel code for each (u, v) 
bychoosing2 N( - nR{u "^ blocklength-N codewords uniformly fromv/^^ , whereW_ {u ^ v) d = (w(«-"0) w . 
The channel encoder and channel decoder specify the mappings W_ and WJ- U ^ V '> , respectively, for 

our stacked solution. Applying the strong coding theorem for discrete memoryless channels WA Theo- 
rem 5.6.2], the expected error probability of this randomly drawn code is 2~ NS . The value 5 is an increasing 
function of the gap min,, ,,[I(W ; W_ (""^ ) — nf^" - *^]. Since the expected error probability (with 

respect to the random channel code designs for all messages W} u ^ v >) decays as 2~ NS , there exists a single 
instance of all channel codes that does at least as well. Thus the stacked solution S(J\f) that first channel 

/ \ — I'll Vl!) 

codes each message W} u ~^ v> to W_ and then applies the blocklength-n solution S(J\f) independently 

in each layer of the stack achieves error probability no greater than 2~ NS for N sufficiently large. ■ 

Since the proof of Theorem Q] shows that stacked solutions can obtain all rates in the interior of M(J\f), 
we restrict our attention to stacked codes going forward; there is no loss of generality in this restriction. 

The arguments that build on Theorem [Q later in the paper employ not the single instance of the code 
chosen at the end of the proof but the random code design that precedes it. This random code design is 
combined with a collection of other random code designs. Choosing the instances of all random codes 
jointly guarantees good end-to-end performance. To understand the implications of the given random 
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(a) (b) 

Fig. 5. (a) A network TV and (b) the corresponding network Mr. that replaces channel (X^' 1 ' ,p(y^' '|ar J ' ),y "' ') by a 



capacity-i? noiseless bit pipe ({0, 1} , S(y 



« X(iiU> 1 ) _?(*. 1 )1 



{0,1}' 



code design, let 



X t (£) ^ (X« (*),..., X^(f» 



Y,(0 



del' 



^(1) 



QTm---,2X^)) 



(ra) . 



be the vectors of all channel inputs and all channel outputs in layer £ of the stacked network at time t. 
For each (u, v), the messages W} U ~*" V >(1), . . . , W} u ^""> ( JV) input to the stacked solution are independent 
and identically distributed (i.i.d.)- Since each channel code's codewords are drawn from the uniform 

/ \ — I'll — y'u) — (u ^i? i 

distribution on yv} u ~^ v > , the coded messages W_ (1), . . . , W (N) for a random code are also 

i.i.d. and uniform. Finally, since the solutions in the layers of M_ are independent and identical, 

(X t (l),Y t (l)),...,(X t (7V),Y t (7V)) 

are also i.i.d. for each t. This structure allows us to apply typicality arguments across the layers of the 
network for a fixed time t. 

V. Network Equivalence 
The equivalence result derived in this section relates the rate region of a network 

to that of a network Mr that replaces channel (X^ l ' 1 \p{y^' 1 '\x^' 1 '),y^' 1 ') by a capacity-i? noiseless 
bit pipe, here denoted by ({0, l} R ,5{y^'^ - x^' 1 )), {0, 1} R ). Thus 

Mr ^ (X-^ x {0,1} VO^ - xW)p(y-W\x-^),y-W x {0 , 1}*) 
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Employing a common abuse of notation, we allow non-integer values of R to designate capacitated bit 
pipes that require more than a single channel use to deliver some integer number of bits. Applying the 
stacking approach of Theorem [Q the arguments that follow transmit information over the N copies of 
each channel in an iV-fold stacked network; thus we have channel ({0, 1}^, 5(x} h1 ' —y^ ,1 ') 1 {0, 1}^) 
in the iV-fold stacked network M_ R . As usual, N is allowed to grow without bound, so the transmission 
of \NR\ bits over N channel uses achieves rate arbitrarily close to R. 

Before turning to the equivalence result, we prove the continuity of capacity region &(Nr) in R for all 
R > 0. Precisely, for any R > and 5 < R, we define 

e(5) = max min \\Tl — R\\oo 

to be the worst-case ^-norm between a point in M{M_ R , s ) and its closest point in &Q£ R _$). We then 
show that for any e > 0, there exists a 5 > for which e(5) < e. Continuity of the rate region at R = 
remains an open problem for most networks lfT8l . |[T9l . The subtle underlying question here is whether 
a number of bits that grows sublinearly in the coding dimension can change the network capacity. 

Lemma 2 Rate region 3%{Nr) is continuous in R for all R > 0. 

Proof. By Theorem [7] it suffices to prove that M{M_ R ) is continuous in R. Note that M{N R ) is non- 
decreasing in R; that is R < R implies M{N_ R ) C M(N_ R ) since any (X,TZ)S(J\f R ) solution for N- 
fold stacked network J\f R can be run with the same error probability on N-fold stacked network M R . 
Thus for any R > and any 5 € (0,R), MQ£ R _ S ) C M{N_ R ) C M(M_ R+S ). Fix any 5 > and 
1Z G mt{M{N_ R+s )). For any A > and all N sufficiently large there exists a (A, TZ)-S(J\f R+ g) solution 
for the N-fold stacked network N R+ $. Recall that Nr+s and Mrs differ only in the capacity of the bit 
pipe from node i tonodej. Thus any solution S(N_ R+ $) that achieves error probability X on N -fold stacked 
network M_ R+S can be run with the same error probability on N-fold stacked network N R _$ provided 

N(R-5) >N(R + 5). 

This is accomplished by operating solution S(J\[ R+ $) unchanged across the first N copies of the channel 
(^X~^' 1 \p{y~^' 1 >\x~^'' 1 '),y~^' 1 ') in Nr-s an d sending the N(R + 5) bits intended for transmission 
across N bit pipes of rate R + 5 in JsLr+5 across the N bit pipes of rate R — 5 in N R _$. Set N = 
\N(R + 5)/(R — 5)~\ . Then the rate of the resulting code is 



N ~ N{R + 5) + R-5 
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Since TZ and R are fixed, the difference 

m jl m 2N5 + R-5 
TZ-TZ<TZ- 



'N{R + 5)+R-5' 

can be made arbitrarily small by letting N grow and 5 approach 0. Since TZ is arbitrary, we have the desired 
result. ■ 

The following lemma derives a lower bound on M{M). 
Lemma 3 Consider a pair of networks 

M c = (x-^ l) x {0,1} C , 6(yW -x( i ' 1 ')j>(2/~ (j ' 1) |x" (!,I) ).r (j ' 1) x {0, 1} C ) , 
where C = max p(a . Ci ,i)) I(X^; Y^ 1 )) is the capacity of channel {X^ l \p{y^^\x { - i '^),y^'^). Then 

M{M C ) c M{M). 

Proof. The following proof shows that & (Mr) C 3t{M) for all R < C. This shows that U r<c &{Mr) C 
M(M), which gives the desired result by Lemma^and the closure in the definition of M{M). Applying 
Theorem^ for each R < C we show thatM{M R ) C M{M) by showing thatM{M_ R ) Q ^(M). 

Fix any R < C, TZ G int{M{M_R))> an d A > 0. We first use the argument from the proof of Theorem{l\ 
to build a sequence of rate-lZ solutions S_(M_r) with error probability no greater than 2~ NS for all N 
sufficiently large. Recall that only the channel code on the messages W} u ^ v > changes with N. Thus for 
all N > 1, solution S_(Mr) applies the same solution S{Mr) in each layer of the stack. Let n be the 
blocklength of code S{Mr) (and therefore the blocklength ofS_(M_ R ) for all N). 

Since R < C, A > 0, and n are fixed, there exists a sequence of channel codes {(chat, (3n)}n=i f° r 
channel (X^ %,1 > ,p(y^ ,1 '\x^ l ^),y^^) with encoders ajy, decoders /?jv. an d maximal error probability 
max^Pr^ArfT^ 1 )) / w\X} 1 ^ = a N (P^ N) )) < A/(2n) for all N sufficiently largeft 

The next step is to build a solution S(M) for N-fold stacked network M . Solution S(M) operates S(Mr) 
across M_ by using channel code (ajv,/3jv) at each time t to transmit across the N copies of channel 

2 We here divide by n since the channel code will be applied n times, once for each instant in time for this blocklength-n 
code. Application of the union bound then gives an error probability over these n time steps. 
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Fig. 6. Operation of node i at time i and node j at time £ + 1 in solutions (a) S_(N_ R ) and (b) S(J\f). We show the nodes at 

~ (* i) 
different times since the output X_ t from node i at time i cannot influence the encoder at node j until time t+1 (due to the 

causality constraint). 



{X^' ,1 \p{y^ ,1 ' > \x i '' L ^) l y^ ,l ^) inM, as shown in Figure® Precisely, at time t, nodev performs any neces- 
sary channel decoding on the channel output to give 

~( V ) I Gfor(y? ,1) ),y?' a) ) v=j 

±-t 



Y 



(v) 



V / j, 



then applies the node encoders from S(J\f R ) as 

and finally applies any necessary channel encoding as 



X 



(*) 



{a N {X t ),X t ) ifu = ? 

if 



ifv / i. 
before transmission across the channel. At time n, node v applies the decoder from S(Af R ) to give 



W 



(u—>v) j Ju—>v) .- r (v) 



M 



wT \YX\. . . ,tr,w^\ . . . , w^ m) )- 
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To bound the error probability, note that two things can go wrong. Either the channel code fails at one or 
more time steps or the channel code succeeds at all n time steps but the code fails anyway. If the channel 
code (ckjV) Pn) succeeds at all times t £ {1, . . . , n}, then the conditional probability of an error given 
W_ = wis precisely what it would have been for the original code. Let E t denote the event that the channel 
code fails at time t. Then we bound the error probability as 

Pr(w^W) < ^Pr(Et) + ^Pr(w^w\{w = w}nn? =1 EC)Pi({w = w}nn? =1 Ef) 

t=\ w 

which is less than A for all N sufficiently large. Inequality (a) follows from the union bound. Inequality 
(b) follows from the error probability bound for the channel code and from the observation that Pv({W_ = 
w} n n r t l =1 Ef) < Pi(W = w) for all w. ■ 

Lemma [3] applies channel coding to emulate a noiseless bit pipe ({0, 1}*, S^y^' 1 ' Ix^' 1 )), {0, 1}*) across 
a noisy channel {X^- 1 ' \p{y^ ,1 '\x^ l ' 1 >),y^' 1 ') so that a code for Mr can be run across M with the aid of 
the channel code. Theorem @] employs a code that emulates noisy channel (X^' 1 ' ,p(y^' 1 '\x^ , ' l '),y^' 1 ') 
across noiseless bit pipe ({0, 1}*, S^^'^lx^' 1 '), {0, 1} R ) so that a code for M can be run across Mr 
with similar error probability. 

Theorem 4 Consider a pair of networks 

M = (X-W xX^ l \p(y^\x^)p(y~^\x-^),y-^ x ytf.D) 
Mr = {x~^ x {0,l} fi ,# 1) -i (,,1) )p(f (j ' 1) |^ ( '' 1) ),r (j,1) x {0,1}*) , 
where (X^ l \p{y^^\x (i ^),y^^) is a channel of capacity C ' = f max p(a . ( «, y ) I(XW)YW). 

IfR>C, then 

M{M) c £g(M R ). 

Proof. By Theorem^ it suffices to show thatM{M) C M{M_ R ). Fix an arbitrary point 1Z £ int(M(M)) 
and any A > 0. The argument that follows shows that for all N sufficiently large there exists a (\,1Z) 
solution S(M_r) f° r N-fold stacked network M R . We first define a random code design algorithm and bound 
the expected error probability with respect to the random design. This random design includes random 
selection ofm(m — 1) channel codes and random design of channel emulators for each time step. In order 
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to ensure good end-to-end performance, we do not choose a single instance of any of the randomly designed 
codes until all of the codes are in place. At that point, we choose all codes simultaneously. 

Step 1 - Choose codeS(Af) and define distributions p t (x^' 1 \ yV' 1 '): 



Recall from the proof of Theorem^ that there exists a rate-lZ solution S(J\f) of some finite blocklength n 
from which good stacked solutions S(M) for N-fold stacked network M can be built for all N sufficiently 
large. The stacked solution applies an independent random channel code to each message W} u ~* v > and 
then applies S(J\f) independently in each layer of M_. Taking an expectation over the random channel 
code designs yields expected error probability no larger than 2~ NS for all N sufficiently large. We there- 
fore begin by fixing a solution S(J\f) as in Theorem [7] and building the corresponding stacked solution 
S(Af). For each t E {1, ... ,n}, let ptix^' 1 ') be the distribution established on the input to channel 
(X( l < 1 \p(y(J' 1 )\x ( -' l ' lS) ),y( : >' 1 ^) at time t by solution S(M). Distribution p t {x^ ,l ">) may vary with t due, 
for example, to feedback in the network. Then ptix^' 1 ' ,y^'^) = Y\.e=\Pt{^ {^))p{v {^)\^ i^)) 
is the time-t distribution across (X} hl \p(y^' 1 ^\x^' 1 ^),yy'' 1 ^) under solution S (Af) . 



Step 2 - Typical set definitions and properties: 



Let e = (e(l), . . . , e(n)) be a vector of positive constants}?] and for each t define 



I 



X-]' " ,l {(^ 1 ),y^ 1 ))G^' 1 ) xykM) : 



. (JV) def 



~]ogpt(x^)-H(X^) 


<<t) 


-±-logp t (yW)-H(Y t ^) 


< a(e,t) 


~logp t {x^\y^)-H{. 


X p,Y t ^)) 



<a(e,t) 

where H{xf' iy ), H{yP' X) ), andH(xf ' 1] ,Y t (j ' 1] ) are the entropies on xf' 1] , Y t {jA) , and (xf ' 1) ,Y t (J ' 1) ) 
^. 1 ),yO'.l))J4| and 



under p t (x^ 



clef 



(e,t) = (l + e(i))-inf{e' > : Pr 



1 



-d,i) 



N 



lo gPt (X}^,Y t )-H(Xr>,Yr>) 



(M) v(j,i)« 



Vlh 



H(Y t 



(J,i)> 



>e'V 



(3) 



> e' J < 2~ me(t] for all N sufficiently large} . 



3 Our parameter choice in the typical set definition varies with t both to accommodate variation in Pt(<£ j# ) an d to 
handle the cumulative impact of channel emulation at multiple time steps. 



We use notation H(-) for both discrete and differential entropy. We assume that H(X t 1 ' , Y t 



(»>i) vO,i)\ 



< oo. 
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(This infimum is shown to be well defined in the proof of Lemma^in Appendix^) Define set 



X': ' "- {(xM,yM) e A[f :p[{A^f\x^) < 2~ m ^] 



UN) def 
V e,t 



where ptdAifyix^) = EyU.D^iw^.^fU^ p(y UA) \x^). We henceforth call A™ the typical 
set. This typical set definition restricts attention to those typical channel inputs x^- 1 ) that are most likely to 
yield jointly typical channel outputs. This restriction is later useful for showing that the number of jointly 
typical channel outputs for each typical channel input is roughly the same. Such a result could be obtained 
more directly for finite-alphabet channels if we used strong typicality, but we here treat the general case. 
Lemma\^in Appendix® shows that 

Pt ((Aify) < 2- N **> (4) 

for some constant c(e, t) that goes to zero as e(t) goes to zero and grows large as e(t) grows large. 

Step 3 - Design of channel emulators: 

We next design codes (ajv,t)/?jv,t), t € {1, . . . , n}. The goal of the code design is to build a collection of 
devices for emulating N independent uses of channel {X^ 1 \p{y^ ,1 ^\x^ l,1 '),y^' 1 ') over N independent 
uses of bit pipe ({0, 1}^, 8{y^' 1 ' — x^'^),{0,l} R ). The code for time t emulates the channel under 
input distribution pti^ )• Code (ajv,t> &N,t) has encoder a N.t ■ X} 1 ' 1 ' — > {0,1}^^ and decoder (3 N,t '■ 
{0, 1} NR — > yy' l > . Thus (ctN,t,0N,t) i s effectively a lossy source code with rate R and blocklength N. 
This source code differs from traditional source codes in that a good reproduction is not a value X_ t ' that 

(i 1) (i 1) 

reproduces JQ ' to low distortion but a value Y_ t that is similar statistically to the vector of outputs 
observed when Xj' is transmitted across (X} 1 ' 1 ' ,p(y^' 1 ^\x^ l ' 1 ' > ),y}^ 1 '). Since the channel usually maps 
typical inputs to jointly typical outputs, we design our source code to do the same. 

First, randomly design decoder /3jv,t : {1, ■ ■ ■ , 2^^} — > y}^ 1 ' by drawing codewords 

P N , t (l),...,l3 N>t (2 NR )~ U.d.p t (yW). (5) 

Then, design encoder a N<t : X^ ->• {1, • • • , 2^} as 

a N , t {x {hl) ) = 



k if(x^\(3 N ,(k))GA^ 

(6) 



l iffiks.t. {x^ x \p N)t {k))eA 



(AT) 
e,t ■ 



When there is more than one index k for which (x} 1 ' 1 ' , PN,t{k)) S A et , the encoder design chooses 
uniformly at random among them. 

Step 4 - Definition of solution S(Nr): 



The next step is to employ codes {(ajv,t,/3jv,t)}tLi to operate S(J\f) across network Nr. Webegin withan 
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Fig. 7. Operation of node i at time £ and node j at time £ + 1 in solutions (a) S(AQ and (b) S(A[_ R ). We show the nodes at 
different times since the output X_t from node i at time t cannot influence the encoder at node j until time t + 1 (due to the 
causality constraint). 



informal description of the resulting code, here denoted by S(J\f R ). For each node v {i, j}, the operation 
of node v in S(J\f_ R ) is identical to the operation of node v in S_(M). Node i applies its node encoder from 
S(J\T) as usual and then source codes the resulting channel input transmission; the node decoder at node i is 
unchanged. Node j source decodes the bit-pipe output before applying its usual encoder and decoder from 
S(J\T). Figure\7\illustrates these operations, defined formally below. 

~ ( v ) 
For each v G V and t € {1 n}, letY t be the time-t channel output at node v in S(J\f R ). Each node 

v applies its node encoder as 

x? = *i v) (X.? ,■■■, nt\ , W. {v ^ 1] ,..., W ( ™ } ) , 

which it encodes (if necessary) as 

~{v) \ (aiY,t(2d l,1) ),2C (i ' 2) ) ifv = i 
At - 



2Q 



(v) 



ifv 7^ i. 



23 



TABLE I 

Summary of notation for solution S(JV r ) 



Variable 


Meaning 


S(A0 


solution used in each layer of S(J\f) 


n 


blocklength of solutions <S(JV) and S(J\f) 


W = (W} u ~> v) : (u,v) e{i,.. 


,m}) 


messages 


X i = (lW : «e{l,...,m}) 


network inputs at time t 


X* = GL M --V€{l,...,m}) 


network outputs at time t 


w = (w^~* v) ■ (u,v) e{i,.. 


,m} 


reconstruction of messages 



(v) 

before transmission. Here Y_ t designates the channel output after any necessary decoding, giving 



Y 



(«) 



ra fv(i' 1 \ •^>(•''' 2 )^ •* 
(pN,t(Y t ),Y t ) ifv = j 



£ M 



ifv / j. 



Finally, node v applies the decoders from S_(M_) as 



W 



(u—>v) 



r (u-+v) . („) 



ft*™) (yW > . . . } y W , W*^ 1 ) , • • • , ^ ( ^ m) ) • 



Solution S(M R ) is not a stacked solution since each (ajv,t> /0JV,t) operates across the layers of the stack. 

Step 5 - Characterizing the behavior ofS(J\f R ): 

In order to analyze the error probability of code S(M R ) we first characterize its statistical behavior. Tabled 
summarizes the random variables used in the definition of the solution S_(M_) from which S(J\f R ) is built. 
Applying solution S(Af) on N-fold stacked network N_ yields joint distribution 



P(m.,x n ,y n ,w) =p(w) 



n^i^ 1 * 



w 



4=1 



I]>(yJ^) 



,*=i 



p(w\w,y n ). 



Herep(w) is the distribution on messages; p{x t \y l ~ l ,w) results from the operation of all node encoders at 
time t, each of which maps its received channel outputs and outgoing messages to channel inputs; p(y | x t ) 
is the memoryless channel distribution; and p(w\w,y n ) results from the operation of all node decoders, 
each of which maps its received channel outputs and outgoing messages to reproductions of its incoming 
messages. Here p(x t |y' _1 , ii;) and p(w\w,y n ) capture both the distribution over channel codes and the 
deterministic operation of the node encoders from S(AT)- 

The corresponding distribution for solution S(J\/^ R ) on J\[ R is similar. In particular, since the distribu- 
tion on messages is given and we employ all of the same codes, distributions p(w), pfely 4 ™ 1 ,;?/;), and 
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p(w\w,y n ) remain unchanged. The only difference between S_(M_) and S(J\f R ) is the replacement of 
channel (^'^^(y^'^lx^' 1 ^),^' 1 ^) by the random channel emulator. Thus at time t, solution S(Af 
replaces the channel distribution 



.Hi 



N 



1=1 

by the emulation distribution 

(Note that the channel emulator eventually applied is a deterministic source code. The given distribution 
reflects only the random code design.) Thus S(M R ) achieves distribution 



p(m,x\ n ,y n ,w) =p(ml) 



Y[p(* 



*iy w ,«0 



i=l 



ilMv^ld^My 



-<j,i)\--(i,iy 



i=l 



P(w\w,y n )- (7) 



In general, fit (y^' 1 ' \x} h1 ^ ) will not be precisely equal to the channel distribution p{y^ ,1 '\xy' 1 ' ) that it 
designed to emulate. Lemma\j^in AppendixIIH shows 

Pt{y_ {hl) \x^) < p (yO-,l)|^,l)) 2 MM6,t)+2e(t)+l/iV) 

foraii(xf' 1) ,y^ 1 ))GiJJ ) .Let 



was 



(8) 



denote the conditional probability that (2Lt >Ht ) -A-e t given JjQ 1 ' = x\ l ' under operation of code 
S(J\f R ). Using a proof similar to that for the rate-distortion theorem, Lemma [10| in AppendixHTH shows 

M(A^r\x^) < M(A^r\x^) + e - 2 — ^• i, ^ i >—» . (9) 



Step 6 - Bounding the expected error probability: 

The following error analysis relies on both probabilities resulting from the operation ofS(J\f) on M_ and 
probabilities resulting from the operation of random code S(A[ R ) on N R . To avoid confusion between the 
two, we use Pr in the former case and Pr in the latter case. 



Define 



JN) def 



B r>= i &■>->, y 



(i,i) 7l 0Mh 



:Pt[W^W 



'..'-'■) v ( ' 1 )) = ^,l) ) ^,l))j >2- m ' 2 \ 



w x ',y* 



(10) 



to be the set of input-output pairs on channel (^ l ' 1 \p(y^ ,1 '\x} l,1 '),y^^ 1 ') at timet that are most likely to 
lead to errors in the operation of 5(AA) on M_; we think of B\ as the "bad" set. For each t G {1, . . . , n} 



iW 



AN) 



we treat (Xt » X-t ) Kt \ B t as an error event We therefore define G t C (X&V x yUM)" as 



def , ,t 



-(<,1) vO'.i) 



(JV) v R (JV)- 



Gt= ^{(XF>,Y%»)e4V\BF>} foreachte {!,..., n} 
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and Go = (X} 1,1 > x Ja?' 1 )) 71 to be the event that none of these error events has occurred in the first t time 
steps; we think of each Gt as a "good" set since it describes the event that channel input-output pairs up to 
time t were typical and not "bad." Since (G n ) c = U" =1 ((G t _i n (ig°) c ) U (G t _i D ig° D B ( t N) )), the 
union bound gives 

n 



+ Pr [G n n{W^W} ). 



t=i 



This is an expected error probability since Pr(-) captures the random code design in addition to the random 
message choice and random action of the channel (X_~^ i,1 \p(y~^ ,1 '\x~^ l ' l ^),y_~^' 1 ^). To bound the first 
two terms in this sum, note that by {7J) and (8$), 



Y[ptet>\y? \m) 



t'=i 



fex*- 1 ,y*- 1 ,x7 < *' 1) ):(x<;' 1) ,y^ 1) )ei<™ ) W<t 

t'=l 

< J- 2 Jv ^- 1 ( 4 ^' t ') +2 ^ i ')+ 1 / 7V )p(u;,x < ,y i - 1 ) 

fex*- 1 ,y*- 1 ,x7 (i ' 1) ):(^;' 1) ,^' 1, )GA^ ) Vt'« 

< 2 Af ^ r =i (4a(e '*' )+2£(i ' )+1/Ar) p t (x (4 ' 1) ) (11) 

for each x^' 1 ) G X} 1)1 > . This bound captures how the input distribution to node i at time t is affected by the 
replacement of the channel by its emulator in all previous time steps. Applying ©, 071) . and @ gives 

Pr(G t ^n(Aifr) 

J2 MGt-1 n {X?^ = x^})p t ((Aify\x^) 



xwexw 



< 



-&1) _ -.(*,!) 1 



jW\q T (>,i)^ 



c (i,i) S A'( i - 1 ) 



+ e' 



_ 2 JV(H-/(X< i ' 1) : y t <J ' 1) )-2a( e ,t)-E(4)) 



(i,l). v OM) 






)-!a(e,i)-.(t)) 



< 2 -iY(c(€,t)-E:r = ! 1 (4a( e ,t')+2e(t')+l/A r ) 



+ e 



_ 2 N(R-/(X t < *' 1) i y t (j ' 1) )-2a( £ , t)-<=W) 



(12) 



iW o uCO 



To bound Pi(G t -i D A£ ; D B t l ; )> recaii that for a11 N is sufficiently large SQS) is a (2~ m , K) solution 
for M and that there are fewer than m 2 messages to transmit. Thus for solution S(M) on M, Yi{W_ ^ 
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W ) < m 2 by the union bound, giving 



2r,-N5 



m z 2- Nd > Pr{W^W) 



> Y Pt{x {i > 1 \y U > 1) )Pr{W^W\(x (l ' 1) ,y {J ' 1) )) 

-N5/2 (r>(N)s 



> 2-"°''pt(B, 



t )i 



which implies p t (B { t ') < m 2 2- N5 / 2 onM. Thus for solution S (M R ) onM R , 

Pr(G t ^nAifnB^) 



B («.i) e ^Ci.i) 



(o) 



^nR^l^^f'. 1 )' 



( | 2 JV£j, =1 (4a(M')+2e(*')+l/") £ ^,1)^(1^) n flf )|^.D = g-fcl)) 

< m 2 2 -JV(<5/2-E:- =1 (4a(e,t')+2e(t')+l/A r )) 

where (a) follows from 071) . and (6) foiiows from {8|). Finally, 

P? f G„ n {W / H3-) 

n 



(a) 

< 



Y pfes) 

feu),x™,y™):«;/i«,(xf' 1) ,j/<J. 1 ))eil™ ) \B t (JY) 



. 2 A r Er=i(4a(€,*)+2€(t)+l/iV) 



IlP(yJ^) 



.4=1 



p(^l^,y n ) 



(13) 



( | 2 *53U(4«( e ,t)+:fe(t)+i/JV) ^ pG^'^y^ffi) 

= 2 JVE^(4aM) + 2e( i)+ l/iV) £ p^*' 1 ), yC* 1 )) PrQ^ / WK^'^,^)) 



(^■ 1 \ 2 /(i.i))ei^\Br ) 



< 



2 W E? =1 (4a(e,t)+2e(t)+l/AT) 2 - JV5/2 



(14) 



Equation (a) follows from © and ©. In (6), we sum ^'^ x ^ (j,1) rather than i^° \ B ( t N) for all t > 1. 
Equation (c) follows from the definition of B\ in ifiOl) and the bound pi(A e x \ B[ ) < 1. 

Step 7 - Parameter choice: 

We finally show that we can choose typical set parameters e = (e(l), . . . , e(n)) such thatPr(W_ / W_) < A 
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for all N sufficiently large. Since n is fixed and finite, 1FI2I) . i(73l ). and d74l) imply that the expected error 

probability ofS(J\f R ) goes to zero provided 

t-i 

^(4a(e,O + 2e(O + l/A0 < c(e,i) 
t'=i 

2a(e,t) + e(t) < J2 - /(X t (ill) ; Y^ X) ) 

n 

^2(4a(e,t') + 2e(t') + l/N) < 5/2. 
t=i 
Recall that constants a(e,t) (defined in Q])) and c(e,t) (defined in Lemma\$in Appendix U) depend only 

on distribution pt(2r , y )) a fld the value e(t). Each goes to as e(t) approaches 0. The following 

sequential choice ofe(n) , . . . , e(l) yields the desired result. Set e(n) such that 4a(e, n) + 2e(n) < 5 /{An). 

Then for each subsequent t, set e(t) such that 

'5 R-hiX^-Y^) c(e,t + l) c(e,n)\ 



2a(e,t) + e(t) < min 



4n' 2 ' i + 1 ''"' n 



This gives the desired result since R > I(X^ ,Yj' ') (by the theorem assumption and definition of 
capacity) and 5 > 0. 

Since the expected error probability with respect to the given distribution over codes approaches zero as N 
grows without bound, there must exist a single instance of the code S(N R ) that does at least as well. ■ 

Remark 2 It is interesting to specify the choice of parameters in Theorems Q] and [4] required to guarantee 
the existence of a (A, 7£)-5(A/"ij) solution for an arbitrary A > and 1Z G int(& (J\f)). Since we 
have 1Z £ mt(&(Af)) there exists a 7£ G mt(&(N)) with K > 11. We choose p in Theorem Q] 
accordingly as uYm. u ^ v {RS u ^ v ' — R( u ^ v '}. Once p is chosen, we choose A and n so that the condition 
p > ma,x U)V {R( u ~* v ^}\ + h(X)/n is satisfied for a (\,1Z)-S(AT) solution of blocklength n. Note that 
R( u ~ j > v > is less than the capacity of the channel p(W^ u ^ v ^\W^ u ^ v ' > ) imposed by this solution, so 5 > 0. 
Fixing S(M) fixes distributions pt^x^ 1 ' 1 '). We next choose e as specified above and design source code 
(ctN,t,(3N,t) for N sufficiently large. The resulting code can be run on Mr (rather than N R ) as described 
in the proof of Theorem [TJ 

Corollary [5] finally proves network equivalence for point-to-point channels. 
Corollary 5 Consider a pair of networks 

Nc = (x- {i ' l) x {0,1} C ,5(y {hl) -x^)p(i/-^ 1 )|a?- (i ' 1) ),y- (j '' 1) x {0, 1} C ) , 
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wi3ere(^( i ' 1 ),p(y^ 1 )|x( i ' 1 )),^ (j ' 1) )isachaMeiofcapacityC' = m a x p(:r ( l ,i) ) /(X( i ' 1 );y^ 1 )) > 0. Then 

Proof. The result is immediate from Lemmas [2|and[3] and Theorem [4| ■ 

VI. Conclusions 

The preceding results show that the capacity of a memoryless network containing an independent point- 
to-point channel equals the capacity of another network where that noiseless channel is replaced by 
a noiseless bit pipe of the same capacity; thus any collection of demands (e.g., a collection of unicast 
demands) can be met on the first network if and only if it can be met on the second. Sequentially applying 
this result to each channel in a network of point-to-point channels proves that the capacity of a network of 
independent, memoryless, point-to-point channels equals the capacity of a network of noiseless bit-pipes of 
the corresponding capacities. This also implies that the capacity of a network of independent, memoryless, 
point-to-point channels equals the capacity of any other network of independent, memoryless, point-to- 
point channels of the same capacities. Thus, from the perspective of capacity, a Gaussian channel is no 
different from a binary erasure channel of the same capacity, despite the Gaussian channel's far broader 
range of possible behaviors. The given equivalence result proves the optimality of coding strategies that 
separate joint source and network coding from channel coding; there is no loss in capacity associated 
with performing independent channel coding on every point-to-point channel. The result also opens the 
way to the analysis of noisy networks using analytical and computational tools built for characterizing 
network coding capacities. 

In addition to proving the equivalence between networks of noisy channels and networks of point-to- 
point bit pipes, the other main contribution of this work is the introduction of a new strategy for tackling 
networks of noisy components. Lemma [3] and Theorem @] show that the capacity of one network is a 
subset of the capacity of another network by showing that any code that can be run with asymptotically 
negligible error probability on the first network can be run on the second network with similar error 
probability. In part II of this paper, we apply the same approach in bounding the capacities of more 
general networks. This approach represents one step towards the goal of building computational tools for 
bounding capacities of networks using deterministic models of the network's component channels. 
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Appendix I 

LEMMA0 

Lemma [6] proves that pt((A\ t ') c ) decays exponentially to zero. Using the notation of Section ITVl 
X(i,i) = (x(*.i)(i) > . . . .^^(JV)) and Y} j >^ = (YWQ), . . . ,YW(N)) denote iV-dimensional vec- 
tors corresponding to the iV-fold stacked network. 

Lemma 6 Let (Xy' 1 ', Yy' 1 ') be drawn i.i.d. p t (x^ t,l \y^^). Then there exists a constant c(e,t) > for 
which 

Pt( 

for all N sufficiently large. Constant c(e, t) approaches as e(t) > approaches 0. 



{((i Cj) )C) < 2 -N C (e, t) 



Proof. The result follows from Chernoff's bound which we apply to averages of i.i.d. random variables. 
Chernoff's bound states that for any i.i.d. random variables A(l), A(2), . . . , A(N), 



Pr (^E^)>«)^ 



^N min s>0 [M (s) — sa] 



N def , 



where M(s)'= lnE[e sA ] and min s> o [M(s) — so] < for all a > E[A] with equality if and only if 
a = E[A] (see, for example, iT20l pp.482-484]). Note that | min s >o[M(s) — sa]\ grows without bound as a 
increases while | min s >o[M(s) — sa]\ approaches as a approaches E[A\. 

We begin by applying the Chernoff bound to the following sequence of random variables 

- logM2L (i,1) (l)), • • • , - log Pt (X^(N)). 

We then negate the sequence and apply the Chernoff bound again. Combining these results with the union 
bound gives 



Pt 



1 

"N 



\ogp t {X^)-H{xf> l) ] 



> e(t) < 2 



-Nbo+l 



for some bo > as discussed above. Likewise, for any e' > 0, 

Pt 



Pt 



±logp t (YW)-H(Y t W) 

-^\o gPt (X^\Y^)-H(X^\Y t ^) 



>e'j < 2" iV6l+1 
> A < 2 -Nb 2 +i 



for some 6i, 62 > 0. Since b\ and 62 can be made arbitrarily large by choosing e' large enough, the inhmum 
in the definition ofa(e, t) is well-defined. 
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Now note that 



Pt((4?T) 



< Pt 
+Pt 

v 



±lo gPt (X_W) - H(X^] 

-±\og Pt (Y^)-H(Y t { ^) 



> c(t) 
> a(e,t) 



-Uog Pt (X^\Y^)-H(X^\Y t ^: 



> a(e,t) 



< 2-^^+1 i 2--W6e(t) 



where the first inequality applies the union bound and the second inequality follows from our first Chernoff 
bound and the definition of o(e, t). Let 

C (N) def L(i,l) eX (i,l) : 



^logptix^-HiX^] 



< <t), 



Pt 



(4 N t ) r\xf' 1) = x^)>2- m ^}. 



Then 



pMTY) = Pt ((4?) C ) +Pt ({(# V^) 6 4? = 2 ftl) 6 Cf >}) 



£ M(4M+^> 



< 2 



-JV&o + l 



+ 2-^W +p t q ( 



t(JV) 



rW 



To bound p t (Ct ), note that from the definitions ofa(e,t) andC t , 



(AT) 



1 

'iV 



±l ogPt{ Y(M)-H(Y t W) : 



> a(e,t) 



V 



iog»(2f^,2:^)-^(^,y^; 



> a(e,t) 



^(M) = ^,i) 



(A0\ 9 -3iVe(t) 



> pt(cr>)2 

Thusp t (Cl ) < 2 _iV3<E W, which gives the desired result. 



Appendix II 
Lemma[9] 

Lemma [9] bounds the distribution ^(y^' 1 ^!^ 4 ' 1 ^) obtained by random source code (ajv,t> /3jv,t)- Our 
restriction on the typical set is useful for that proof. The randomness in ptiy^'^lx^' 1 ^) results from the 
random source code choice. Lemmas [7] and [8] are intermediate steps used in the proof of Lemma [9] 
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Let function Kt(x} hl \y^ s1 ^) be denned as 

,.„ rn dcf f 1 if (x«> 1 \yV> 1 '>) e A™ 
K t (x^ x \^ x) ) = I K ~ - ' e '* (15) 

I otherwise 

(cf. fTBI steps 10.93-10.102]). Lemma [TJ below, characterizes Pt(y |^ ) a s a function of the prob- 
ability 

ft(2 fel) )= E ^'V^W^) 

that a single codeword drawn at random is jointly typical with x^ 1 ). Precisely, the lemma shows that 
Pt(y ) / Qt(x ) is the probability that x^' 1 ) is mapped to yW' 1 ) given that there is at least one codeword 
in the codebook that is typical with x^' 1 '. Lemma [8] then bounds qt{xS 1 ^) for all x^' 1 ) satisfying the 
conditions of A e t . 



Lemma 7 Let (ajv,t, /3j\r,t) be the random source code defined in © and ©. Then for any (xy' 1 ^ , yV' 1 ') G 



A {N) 



Pt(y u ' '{x}' >) =p t {y Kh >) riiW\ 



Proof. Recall that qt(x} 1,1 ') is the probability that a single randomly drawn codeword Y}^ 1 ' satisfies 
(x} 1 ' 1 ' , yy>l) ) £ A e t . Using the given random code design, for any (x} t,1 > , y W'- 1 ) ) g A et , 



(f.i)|-.(*,i)^ 



p*(y u ' % 






^ 0,1) ) El" ] -(! - ftfe^)) 2 **^ Er I ^-**^ 1 ]- 



2 7ViJ \ j 
i^l \ J / J fc^l \ & 

Here j is the number of codewords that are jointly typical with x} 1 ' 1 ', k is the number of those codewords 
that equal y^ ,l \ and term k/j follows from the uniform distribution over jointly typical codewords in the 
encoder design. In the second equality, a = qt(xy' x ') — pt{y^^) and b = Ptiy^' 1 ^)- Thus 

2 NR ( 9 NR \ , * 

ft ^.i)| £ ftD) = Pt (y^)J2[ . r-(l-qt(x^)r R -^ b [(a + by-ai] 

2 NR ( 9 NR \ , 
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Lemma 8 Given x^' 1 ) e X^ l \if -±\ogp t (x {i ^) - H{xf' 1] ) < e(t) and p^g )^ 1 )) < 
2-w<t),then 



qt{x {i > l) ) > 2-^( / ( x ' M) ' y "' 1) )+ e W+ 2a ( e ' i )+^) 



for all N sufficiently large. 



Proof. For any x^' 1 ) satisfying the given constraints, we first derive a bound on the number of yd' 1 ) 
values for which (x^' 1 ) , y^' 1 ^) 6 A e t . This is obtained by drawing a random variable Yy' 1 ' according to 
conditional distribution nf =l p t (y( j '^(e^x^ (£)) and showing that (x ( - i ' 1 \Y^' 1 ' ) ) G A^ with probabil- 
ity approaching 1. Since all y w. 1 ) that are jointly typical with x} 1 ' 1 ' are approximately equally probable, this 
probability bound leads to a bound on the number ofy^' 1 ^ vectors that are jointly typical with x^' 1 ) and 
then to a bound on the desired probability. 

By the lemma assumptions, 

Pt((X {iA \Y! 3 ^) ^ A ( ^\X^ = x^) < 2- 3 ^M 

which approaches as N grows without bound. Let F^x}- 1 ' 1 ') = {y^' 1 ' : (x} 1 ' 1 ' ,y^ ,1 >) G A e t }. Then for 
N sufficiently large, 

I<1_ 2 -3^W < J2 P {y_ {hl) \x^) 

= E 



pt^),yUA)) 



Ptix^' 1 ^) 



< \F t (x^\2- N(H ^ \ X < :' ')-a(M)-e(t)) ) 

where the last inequality follows from the usual probability bounds for typical strings. Thus 

|-Ft(£ (i,1) )| > 2 N ^ Y t 3 ' 1) \ x ^ 1) ^ a ^- e ^- 1 l N \ 
which we apply to bound qtix} 1 ' 1 ') as 

«ftfe (i>1) ) = E vM j,1) ) 

yU<VaF t {xW) 

> \F t {x {iA) )\2- N( - H( - YtU ' 1>)+a(e ' t)) 



> 2 



-N(I(X^ 1) ;Y t UA) )+2a(e,t)+e(t)+UN) 
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Lemma 9 For all (x^ l \y^ j ^) G A[ N t ] , 

p t (y^\x^) < p (yO',i)|^,i)) 2 JV ( 4a (^)+^W+ 1 /iV)_ 

Proof. By Lemmas\7\and\8\and the usual bounds on the probabilities of typical elements, 

qtix} 1 ' 1 ') qt{x { ' >) 

< p{y {hl) \x^ l) ) - 



Ptix^Kyti' 1 )) 2-^( / (*t i ' y ; y "' 1) )+2a(£,i)+£(t)+l/iV) 
< piyWlxWy 



2 -AT(/(X| i ' 1) ;y t w ' 1) )-2a(e,t)-e(t)) 



2-iV(/(X| i ' 1) ;y«' 1, )+2a(e,t)+e(t)+l/iV) 



p r y U,l)\ x (ilh2 N ( ia ( e 'Q+ 2e W +1 / N \ 



Appendix III 
Lemma ITOl 

Lemma [TOl bounds the conditional probability that (Xf , Y_ t h ) is not jointly typical under the operation 

of S{M_). 

Lemma 10 For all x^^ e X_ {i ' 1] , 

Proof. If | - (l/iVJlogptCs^ 1 )) -H(xl l,1) )\ > e(t) orp^A^f^ 1 ' 1 ^) > 2- 3We W, then 

A((i^ ) ) C k (i ' 1) )=^((4?) C |^ 1) ) = l 

by definition ofA et . Otherwise, (xf' , y^' 1 )) A et when none of the 2 NR codewords of ft N,t is jointly 
typical with xj' . In this case, using definition 031) and following the proof of the rate-distortion theorem, 

2 JVR 

Pt((A^y\x^) = ( 1 - J2 Pt(y_^)K t (x^\y_^) ' 

\ go- 1 ) 



< 1 - J2 p(y?' 1) \x< i ' 1) )K t (x( i ' 1 \vV' 1) ) + e 



</■ 



PtdAifnx^ =x_W) + e -* r <*-'<* ' w '>-<••*>-«». 
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where (a) follows from (1 — ab) k < 1 — a + e bk [15, Lemma 10.5.3] and the usual bounds on probabilities 
of typical strings 

r--n riw-n Pt(y ( * ,1) )Pt(£ (< ' 1) ) 

ftCv ' 15 ) = P(y fa ' 1) |g (, ' 1) ) 7 (il )»m 



> p(2/ (i,1) b (i ' 1) )2 _iV(/(x ' ;^ l3 ' 1J )+2a(e,t)+6(t)) _ 



for all x^ £ Af^' 1 ).] 
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A Theory of Network Equivalence 

Part II: Multiterminal Channels 

Ralf Koetter, Fellow, IEEE, Michelle Effros, Fellow, IEEE, and Muriel Medard, Fellow, IEEE 



Abstract 

The equivalence tools used in Part I to study networks of independent, noisy, memoryless, point-to- 
point channels are here extended to networks containing more general channel types. Definitions of upper 
and lower bounding channel models are introduced. By these definitions, a collection of communication 
demands can be met on a network of independent channels if it can be met on a network where each 
channel is replaced by its lower bounding model and only if it can be met on a network where each 
channel is replaced by its upper bounding model. This work derives general conditions under which 
a network of noiseless bit pipes is an upper or lower bounding model for a multiterminal channel. 
Example upper and lower bounding models for broadcast, multiple access, and interference channels are 
given. It is then shown that bounding the difference between the upper and lower bounding models for 
a given channel yields bounds on the accuracy of network capacity bounds derived using those models. 
By bounding the capacity of a network of independent noisy channels by the network coding capacity 
of a network of noiseless bit pipes, this approach represents one step towards the goal of building 
computational tools for bounding network capacities. 

Keywords: Capacity, network coding, equivalence, component models 

I. Introduction 

This work is motivated by the desire to build computational tools for characterizing the capacities of 
networks. Traditionally, the information theoretic investigation of network capacities has proceeded largely 

R. Koetter was with the Technical University of Munich. M. Effros is with the Department of Electrical Engineering, 
California Institute of Technology, Pasadena, CA 91125 USA e-mail: effros@caltech.edu. M. Medard is with the Department 
of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA e-mail: 
medard@mit.edu. 

Submitted to IEEE Transactions on Information Theory April 14, 2010 DRAFT 




y(2W(2) 



BC T" AW MAC 
(a) 




(b) 



Fig. 1. Separate network and channel coding fails to achieve the unicast capacity of (a) a four-node network with dependent 
noise at the receivers of the broadcast channel and (b) an (m + 2)-node network with independent noise at the receivers of the 
broadcast channel. 



by studying example networks. Shannon's original proof of the capacity of a network described by a 
single point-to-point channel HI was followed by Ahlswede's |2] and Liao's capacity derivations for 
a single multiple access channel, Cover's early work on a single broadcast channel [4], and so on. While 
the solution to one network capacity problem may lend some insight into future problems, deriving the 
capacity of each new network is often difficult. As a result, even the capacities for three-node networks 
remain incompletely solved. 

The problem is further complicated by the fact that the capacities of individual channels can vastly 
underestimate the rates that those channels can carry in larger networks. For example, consider the 
network in Figure [Ha), where a broadcast channel p{y^ 2 ' ,y^ 3 '\x^') is followed by a multiple access 
channel p{y^\x^ 2 \x^ 3 >). The two channels are independent, giving 

p^ 2 \y^\y^\x^\x^\x^)=p(y^\y^\x^)p(y^\x^,x^). 

Example Q] shows that the maximal rate for a single unicast demand from source node 1 to sink node 4 can 
far exceed the maximal sum-rate in the broadcast channel's capacity region. Example |2] provides another 
related example. Both examples show that reliable transmission across a network does not require reliable 
transmission across each channel in the network and that restricting each component to transmit reliably 
- that is employing a separated network and channel coding strategy that makes each channel individually 
reliable - sometimes decreases the network capacity. 

Example 1 Figure [TJa) shows a four-node network comprising a Gaussian broadcast channel followed 
by a real additive multiple access channel. The broadcast channel has power constraint ^[(X^ 1 )) 2 ] < P 
and channel outputs Y^ = X^ + Z^ and Y& = X^ + Z^\ where Z^ and Z^ are statistically 



dependent mean-O, variance-iV random variables with Z^ = —Z^\ and P and N are real-valued 
positive constants. The multiple access channel has power constraints E[(X^ 2 ' ) 2 ], E[(X^ 3 ' ) 2 ] < P + N 
at each transmitter and output Y^> = X^- 2 ' + X^\ We consider a single unicast demand, where node 1 
wishes to reliably transmit information to node 4. If we channel code to make each channel reliable and 
then apply network coding, the achievable rate cannot exceed the broadcast channel's maximal sum rate 



max 



1, / aP\ 1 / (l-a)P 
log 1 + — - + - log 1 + 



1 / P 

- log H 

2 & \ N 



_2°V N J 2°V aP + N 
Yet the network's unicast capacity is infinite since nodes 2 and 3 can simply retransmit their channel 
outputs uncoded to give output Y^ = (XW + Z^) + (X« + Z^) = 2XW at node 4. ■ 

It is tempting to believe that the gap between the optimal performance and the performance achieved by 
separate network and channel coding in Example Q] arises due to the unusual statistical dependence in 
the noise. Unfortunately, similar phenomena can also arise when the noise at the receivers of a broadcast 
channel is independent, as shown in Example [2] 

Example 2 Figure (Ub) shows a (m + 2)-node network made from a Gaussian broadcast channel and a 
real additive multiple access channel. The broadcast channel has power constraint E[(X^) 2 ] < P and 
channel outputs Y® = X^ + Z^\ i £ {2, . . . ,m + 1}, where Z^ % > are independent mean-O, variance- 
iV Gaussian random variables, and P and iV are real-valued positive constants. The multiple access 
channel has power constraint E[(X^) 2 ] < P + N at each transmitter i G {2, . . . ,m + 1} and output 
y(m+2) _ ^™+ x^> . We consider a single unicast demand, where node 1 wishes to reliably transmit 
information to node (m + 2). The maximal achievable unicast rate using separate network and channel 
codes is bounded by the broadcast channel's maximal sum rate 



m+i -, / 

El / i 

2 bg 1 + ?^- 

_1 i=2 Z V ^3=2 



aiP \ 1, / P 

max > - log H —. = - log 1 + — 



=2 "J 

The unicast capacity of the network is greater than or equal to 

1, / mP 

77 log 1 + 



2 ° V N 

since nodes 2 through m + 1 can simply retransmit their channel outputs uncoded to give output 

m+l m+1 

y(m+2) = ^ (I (1) + Z (i)) = mX (l) + J- ZW, 

8=2 i=2 

which is a Gaussian channel with power ^[(mX^ 1 )) 2 ] = m 2 P and noise variance EKY^L^ Z^ 1 ') 2 } = 
mN. Thus the gap between the optimal performance and the lower bound achieved through the use of 
a separated strategy is sometimes large even in networks with independent noise. ■ 



Given the difficulty of solving network capacities even for small networks and the failure of individual 
channel capacities to predict the capacity of networks made from those channels, the gap between the 
size of the networks whose capacities we can analyze and the size of the networks over which we 
communicate in practice seems to be growing ever larger. To address this challenge, we here propose a 
strategy for bounding the behaviors of individual channels in a manner that captures their full range of 
behaviors in larger network systems. That is, we derive upper and lower bounding models on individual 
channels such that the capacity region of any network that contains the given channel is bounded below 
by the capacity region of a network that replaces that component by its lower bounding model and 
bounded above by the capacity region of a network that replaces that component by its upper bounding 
model. Thus, an arbitrary collection of demands (e.g., a collection of unicasts) can be met on a given 
network if it can bet met on the network that replaces channels by their lower bounding models and only 
if it can be met on the network that replaces channels by their upper bounding models. 

We focus on upper and lower bounding models comprised of noiseless bit pipes. Using such models, we 
can bound the capacity of a network of noisy channels by the network coding capacity of the network 
that replaces each channel by its noiseless model. While network coding capacities are not solved in the 
general case, a variety of computational tools can be used to bound them. (See, for example, (H, (6l, 
0.) 

Part I [8] in this two-part series derived upper and lower bounding models for point-to-point channels. 
In that case, the upper and lower bounds were identical. We here derive upper and lower bounds for 
more general channel types using the same basic strategy: We demonstrate that the capacity region of 
one network is a subset of that of another network by showing that solutions for the first network can be 
run on the second network. Sections [TT] and HIT] include the problem setup and channel model definitions. 
Section [TV] derives sufficient conditions for upper and lower boundng models. We derive upper and 
lower bounding models for broadcast, multiple access, and interference channels as examples. When 
a channel's upper and lower bounding models differ, we bound the accuracy of the resulting capacity 
bounds by comparing the upper and lower bounding models. Such accuracy bounds may be useful both 
directly and for determining which larger network components should be modeled in the future. 



II. The Setup 

We use the notation established in [9]. Network J\f has m nodes, V = {1, . . . , m}. Each node transmits 
an input random variable X^"> E X^"> and receives an output random variable F™ E y^" '. We use 
X = (X( v > : v E V) and Y = (y( v ) : u G V) to denote the vectors of network inputs and outputs. 
The alphabets may be discrete or continuous. The network is assumed to be memoryless and to be 
characterized by a conditional probability distribution 

P(y|x)=p(y (1) ,...,y (m) |x (1) ,...,x( m )). 

Applying a result from flOl , we characterize rate regions for arbitrary demands by characterizing the 
multiple unicast rate region. This choice simplifies the notation and yields no loss of generality (see [9]). 
Thus a blocklength-n code communicates message 

from node u to node v for each u, v E {1, . . . , m}. Messages W = (W^ u ~* v ^ : (u, v) E {1, . . . , m} 2 ) are 
independent and uniformly distributed (though the proof goes through if the same message is available 
at multiple nodes). By assumption, 1Z = (R,( u ^ v ^ : (u, v) E {1, . . . , m} 2 ) satisfies R( v ^ v ^ = for all v. 

(v) (v) 

At time t, node v transmits X t and receives Y t . We therefore describe the network by a triple 

(m m \ 

n* w ,p(y|x),JIyM (!) 

f=l ^=1 / 

with the causality constraint that A t is a function only of 

{Y^, . . . , Y^\M v ^ l \. . . , iy(™)}. 

For the purposes of this paper, network J\f is arbitrary except for its inclusion of an independent channel 
C, as shown in Figure |2] To make this precise, let V\, V2 C {1, . . . , m}, V\ f\ V2 = 0, denote the nodes 
transmitting to and receiving from channel C, respectively. For example, a broadcast channel C has a 
single transmitter V\ = {i} and multiple receivers V2 = {ji,- ■ ■ ,jk}> a multiple access channel has 
multiple transmitters V\ = {ii,...,ij.} and a single receiver V2 = {j}, and so on. Since each node 
v E V\ may transmit over both C and the remainder of the network and each node v E V2 may receive 
information both from C and from the remainder of the network, we define X( v > = X^"' 1 ^ x X^ v ' 2 ^ for 
v EVi and y^ d = y^V x y^'"^ for v E V 2 . We then use X Vl E X Vl and Y V2 E y V2 to denote the 




Fig. 2. An m-node network containing a channel p(y V2 \x Vl ) = p(y^ 1 ' 1 ', y^ 2 ' 1 '^' 1 ' 1 ' , x^ 2 ' 2 ') from nodes V± — {11,12} 
to node V2 = {ji, jz}. The distribution p(y~ V2 \x~ Vl ) on the remaining channel outputs given the remaining channel inputs is 
arbitrary. 



input and output to channel C and X Vl £ X Vl and Y V2 G y V2 to denote the input and output to 
remainder of the network. The respective alphabets are given by 



x v, = Y[ #(M) 

yV 2 = TJ y(v,l) 

v£V 2 



X 



y 



-w 



-V, 



n * w * n * 



(«,2) 



,^Vi 
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lUgVa 



x ( n y^ 



The independence of channel C from the rest of the network implies a factorization of the conditional 
distribution p(y|x), giving network characterization 



M= (X~ Vl x X Vl ,p(y- Va \x- Vl )p(y Va \x Vl ),y- v ' x y Va ) , 



-(«) 



(«) 



.(«:) 



again with the constraint that random variable X t is a function of random variables {Y x , ... , 1^_| 
Wt*-* 1 ),...,^*-^*^} alone. 

The following definitions are identical to those in [9), which describes them in greater detail. 



Definition 1 Let a network 



a/-^ n^), P( y|x),n^ 



(«) 



vu=l 



u=l 



be given. A blocklength-n solution S(J\f) for this network is a set of encoding and decoding functions: 

m 



u'=l 
m 



«'=i 



mapping (y} v) , . . . , y£J, W^ 1 ), . . . , wC"""")) to x[ v) for each v G V andt G {1, . . . , n} and mapping 
(y} v \. . . , Y r [ v \ W( v ^ x \ . . . , W^"> m )) to W {u ^ for each u,v G V. The solution S(M) is called a 
(X,TZ) -solution, denoted (X,TZ)-S(Af), if Pr(W( u_M ') / W^ v )) < A for all source and sink pairs u,v 
using the specified encoding and decoding functions. 

Definition 2 The rate region &(N) C M.+ m ' of a network M is the closure of all rate vectors 7Z such 
that for any A > and all n sufficiently large, there exists a(\,1Z)-S (N) solution of blocklength n. We use 
int(M{N)) to denote the interior of rate region 3%{N). 

Given a network J\f, the iV-fold stacked network M_ contains N copies of M and delivers N independent 
messages W^ u ^ v ^ for each (u, v). We carry over notation and variable definitions from the network M 
to the stacked network M_ by underlining the variable names. So WJ U ^ V ) G w(«- M ') d =(w( u ^)) N is 
the A^-dimensional vector of messages that the N copies of node u send to the corresponding copies 
of node v, and X^ G X_W = (XW) N and Y^ ] G y(?)^(yb>))* we the AT-dimensional vectors of 
network inputs and network outputs, respectively, for node v at time t. The variables in the ^-th layer of 
the stack are denoted by an argument £, for example WJ U ^ V ^ (£) is the message from node u to node v 
in the ^-th layer of the stack and X_ t v (£) is the layer-£ channel input from node v at time t. The rate 
j^(u^v) f or a stacked network equals (log \W} U ~* V > \) / (nN); this normalization makes rate regions in a 
network and its corresponding stacked network comparable. 



Definition 3 Let a network 

(m m \ 

v=l v=l J 

be given. Let M_ be the N-fold stacked network for N. A blocklength-n solution S(J\f) to this network is 
defined as a set of encoding and decoding functions 

m 
,t-l 



v'=l 

m 

w^ u ^ v) : {y} v ">) n x IT yy^-^') ->• VV (u " 



v'=l 

mapping (yJ , . . . , y£.\, W^ - ^ , . . . , W (v ^ m) ) to X^ ] for each t G {1, . . . ,n} and v G {1, . . . , m} 
and mapping (Y^\ . . . ,Y^\w (v ^ 1 \ . . . ,\VJ v ^ m) ) toW (u ^ v) foreachu,v G {l,...,m}. The solution 
S(M) is called a (X,TZ) -solution for M_, denoted (\,1Z)-S(j\f), if the encoding and decoding functions 



(u—tv) 



imply Pt(W} u ~* v ' ^ W ) < A for all source and sink pairs 



u,v. 



Definition 4 The rate region M(M) C M™ of a stacked network M is the closure of all rate vectors 

1Z such that a (A, 7Z)-S(AT) solution exists for any A > and all N sufficiently large. 

Theorem \T\ from |9), reproduced below, shows that if the messages W} u ^""> are channel coded before 
transmission, then any rate 1Z that can be achieved across a stacked network can be achieved by a code 
that applies the same solution independently in each layer. Such solutions are called stacked solutions. A 
formal definition of stacked solutions follows. Since stacked solutions are optimal by Theorem [Q there 
is no loss of generality in restricting our attention to stacked solutions going forward. 

Definition 5 Let a network M d = (n^i^^KyW^rGli^) be given. Let }£be the N -fold stacked 
network forJ\f. A blocklength-n stacked solution S_(M_) to network J\f is defined as a set of mappings 

m 
x (v) . (yW)t-l x JJ yv;(«-W) ^ x {v) 

v'=l 
m 



such that 



^-*0 = ^(u-^) ( ^ ( „^ 



Xf >(€) = X t W (y^ (*), . . . , y£\(*), W {v ^\£), . . . ^ m) (£)) 

foreachu, v G {1, . . . , m}, i G {1, . . . , n}, and^ € {1, ... , iV}. Here (W ,W_ ) is the blocklength- 

N channel code for the message from u to v, X\ v is the node-v single-layer encoder at time t, and W \ u ~^ v > 
is the node-v single-layer decoder at time t. The solution S_(M_) is called a stacked (A, TVj-solution, denoted 
(X,TZ)-S(M),ifthe specified mappings imply Pr(W {u ^ v) ^w} u ^ v) ) < A for all pairs (u,v) G V 2 . 

Definition 6 The rate region M(M) C M™ of a stacked network M is the closure of all rate vectors 

1Z such that a (A, IZ)-S(N) solution exists for any A > and all N sufficiently large. 

Theorem 1 Jl Theorem 1] The rate regions M{M) and&(N) are identical, and for eachlZ G int{M{M_)), 
there exists a sequence of blocklength-n (2~ NS ,TZ)-S(AT) stacked solutions for M_ for some n > 1 and 



5>0. 



III. Bit-Pipe Models 

The equivalence tools derived below relate the rate region of a network J\f to those of a network J\f{lZc) 
in which channel C is replaced by a bit-pipe model C(TZc) corresponding to some rate vector IZc- We 
here define IZc an d C(TZc) for a generic channel C with input nodes V\ and output nodes V2. Figure [3] 
illustrates these definitions for two example channels. Let 

M = {{A,B):ACV l ,BCV 2 ,A,B^<D} 

K c = (R^ B) :(A,B)eM). 

For each (A,B) G M., bit-pipe model C{1Zq), defined formally below, delivers rate R^ A ^ B ^ from 
transmitter set A to receiver set B. When \A\ = 1, A transmits directly to each node in B. When 
\A\ > 1, each node i G A delivers log | Af^' 1 - 1 1 bits (i.e., a symbol from alphabet X^' 1 ') to an internal 
node v , which delivers R( A ^ BS) bits to each node in B. 

Definition 7 The bit-pipe model C{lZc) is defined as 

c{k c ) d ^ f (x v ° x x v >, P (y v °,y V2 \z v °,z Vl ),y v ° * y v A , (2) 



where x Vl and y V2 are the network inputs and outputs for the nodes in V\ and V2, x ° and y v ° are the 
network inputs and outputs for the internal nodes V = {v A : A C Vy, \A\ > 1}. For each A C V\ with 
\A\ > 1 and i G A, node v A receives copy y( v >*) of x^ 1 ' 1 ' . For each (A, B) G M. and j G B, node j 
receives copy y\ A ~* B i<3 ofx^ A ^ B '. Therefore 

X (A^B) def {()il} ^-» y Vo def j>(^) 



<* - LLacVi:\A\>1 A: y - llieA-* 



* {v) = I\bcv* {A ^ B) 



bcv 2 
P (y v °,y v *\x v °,^)^( [J Y[5(y^-x^)\ ( JJ jj^bjj.^b); 

\( J 4,S)eX:|A|>lieA / \(A,B)sMj'eB 



R ({i}^{ji} 




R({i}->{hja}) 



log lAf^ 1 ' 1 ) 

log^ 2 ' 1 ) 



R ({h}^{j}) 




({i2}^m 



Fig. 3. Bit-pipe models C(TZc) for (a) the broadcast channel with Vi — {i} and Vi — {ji, J2}, and (b) the multiple access 
channel with Vi = {h,ia} and V 2 = {j}. For the broadcast channel, Tic = (.RCW-KiiA^flKO-Kji}^ #(«-►{»})) 
describes a common information rate to be delivered to both receivers and a private information rate for each receiver. For the 
multiple access channel, lZc = (i?" 11 ' - *" , R^ l2 '^^'\ RKw^r-^Ur)} describes an individual information rate from each 
transmitter and a shared information rate from the pair of transmitters. 



Since any network J\f{lZc) interacts with C(1Zc) only through nodes V\ and V2 and does not have direct 
access to the nodes in V , the remainder of this paper abuses notation by replacing (f2]) by 



>Vi 



C(K c )=[X v \p(y V2 \x Vl ),y 



r,V2\z,V^ 



Wi 



(3) 



In another common abuse of notation, we allow non-integer values of BS > to designate capacitated 
bit-pipes that require more than a single channel use to deliver some integer number of bits. Applying 
the stacking approach from the previous section, the arguments that follow transmit information over N 

;{A-^B) def rn 1 X NR( A ^ B 1 



copies of each bit pipe in the stacked network, giving alphabet X_ 



{0,1}' 



Definition 8 Bit-pipe model C (He ) = (X Vl , p(y V2 \x Vl ), j) V2 ) is a lower-bounding model for channel C 
(X v \p(y v *\x Vl ),y V2 ), writtenC(K c ) C C, if andonlyifM(N(K c )) Q &{M) for all 



N(FL C ) 



(X Vl x X- Vl ,p(y Va \x Vl )p(y- v '\x- v i),yK x y~ V2 ) 
{X Vl x X- v \p(y v *\x Vl )p(y- v *\x- Vl ),y V2 x y~ V2 ). 



Definition 9 Bit-pipe model C(1Zc) = (X Vl ,p(y V2 \x Vl ),y 2 ) is an upper-bounding model for channel 
C = (X Vl ,p(y Va \x Vl ),y v *), written C C C(TZ C ), if and only if 'M(M) C M{N{K C )) for all 

M = (X Vl x X- Vl ,p{y v '\x Vl )p(y- Va \x- Vl ),y V2 x y~ V2 ) 
M{K C ) = (X Vl x X- v \p(y V2 \x Vl )p{y- V2 \x- Vl ),y V2 x y~ v *). 

The following lemma shows the continuity of network capacity in the rate of any bit pipes it contains. 



10 



Lemma 2 £9| Lemma 2] Consider any network 

M R = (x- v > x x v \p(y- v *\x- Vi )p(y Va \x Vi ),y- v ' x y v *) 

with Vi = {i} and V% = {j} connected by a rate-R bit pipe 

(X v ^p(y^\x v ^,y^) = ({0,l} R ,5(y^-x^),{0,l} R ). 
Rate region M{Mr) is continuous in R for all R > 0. ■ 

IV. The Equivalence Tools 

Given any network J\f containing channel C, let J\f(lZc) be the network achieved by replacing C by C(lZc) 
in J\f. We here derive conditions under which &(N(1Zc)) Q BS{M) (i.e., C(1Zc) is a lower bounding 
model for C) or M(J\f) C M(N(Jlc)) (i-e., C(1Z C ) is an upper bounding model for C). 

Lemma below, uses channel coding arguments to derive lower bounding models. The proof runs a 
code S_{NJJlc)) across network J\f_ with the aid of a rate-T^c channel code for C. The resulting error 
probability approximates the error probability of 5(AA(7£c)) on Mffic) provided that the probability of 
channel coding error is small. We therefore begin by defining channel codes for a generic channel C. 

Given a channel C with input nodes V\ and output nodes V2, a channel code for C is a mechanism for 
reliably delivering some collection of rates (i?(i l }"^ B ) : % e V\,B C V%) from each transmitter i € V\ to 
each subset of receivers B C V2. For example, a channel code for broadcast channel C with transmitter 
V± = {i} and receivers V2 = iji, J2} delivers common information at rate J^W - *^ 1 ^ 2 }) and private 
information at rates fl(W-*(M) and fl({»}-Kfe}) for some flCW^OWa}), #(«-►{*»}), fltti}-^}) > 0. 
Since there is no mechanism for delivering messages from a set of transmitters, we define channel codes 
only for rates U c that satisfy R^ A ^ B ^ = for all (A,B) e M with \A\ > l|±| 

Definition 10 Given a channel C = ( X Vl , p(y V2 \ x Vl ) , y V2 ) , iet 7e c be a rate vector with R( A ^ B ^ = for 
all (A, B) £ M with \A\ > 1. For any N > 1, a (2 Af ^ c , N) channel code (oin,Pn) for channel C defines 
a collection of encoding functions a jy = (a}j : i E V\) and decoding functions (5n = (p^ 3 : 

({»'}, B) £M,j efi) with 



(i) 


H i (M 4^) 




BCF2 


PN 


y(?,i) _^;p (W ^ B) . 



Nonzero values of ic ~* ' are useful for upper bounding models derived later in the paper. 
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R Ui}-HH}1 =/(x<'. 1 );y0. 1 )|c/) 

R (\i}^V 2 ) _ 7([7 ; y(32.1)) 

for some 

p(u)p(o;( i ' 1 '|u)p(j,^l' 1 ',!/02.1)| a; (i.l)) 



(a) 




H«<1 }-►«}) < f ( X ('M) ; ybM)|x('2.i),Q) 

K H«a}-+«}) < r(x(*2.i);y«.i)pc(<i.i>,Q) 

K H*l}-Ki}) +R ({i2}^{jY) < 7(x<n,i) >x (i2,i) ; y(i.i)|Q) 

for some 

p(2!( i l- 1 '|9)p(a;< i 2-l)|q)p(q)p(j / (j.l)| :E <n.l) i 2,(^2.1)). 



Fig. 4. Lower bounding models for the (a) degraded broadcast and (b) multiple access channels. 



LetyV_= Y\iuy B \ eM X_ . The code's average error probability is 



P ^ N) = w\E^( U U 4^ B) ' j (Y^) 

1 — ' wew \({i},B)eMjeB 

X^) = a { S{w m ^ B) ■ B C V 2 )Vi G vA . 



7^ «Z 



({t}->B) 



Definition 11 The capacity region M(C) of channel C is the closure of all rate vectors 1Zc such that for any 
A > and all N sufficiently large, there exists a (2 ° , N) channel code for channel C with average error 
probability P e (A ° < A. 

Lemma [3 below, shows that Tic G M(C) implies C(lZc) is a lower bounding model for C. Applying 
Lemma [3] with existing achievability bounds for any network gives immediate lower bounding models 
for that network. Figure [4] shows two examples. Zero capacity bit pipes can carry no bits, so they are 
not drawn. 

Lemma 3 If Tic G M(C), then C(Tl c ) C C. 

Proof. The following argument treats points TZq G int(M(C)). The result then follows since &(Af (Tic)) Q 
&(AT) for all Tl c G int(M(C)) and M(Af(Jlc)) is continuous in Tl c by Lemma^ together imply that 
M(Af(Jlc)) ^ M(Af) for all Tic G M(C) by the closure in the definition of the network capacity region. 

Consider a pair of networks, 

M = (X Vl x X- Vl ,p(y V2 \x Vl )p(y- Va \x- Vl ),y Va x y~ Va ) 
AT (Tic) = (X Vl x X- Vl ,p(y V2 \x Vl )p{y- V2 \x- Vl ),y V2 x y~ v »). 
LetAf_ andMiJlc) be the N-fold stacked networks forAf and Af(Tlc). By Theorem [TJ it suffices to prove 
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that &(N(Kc)) C 8Z{hP) for N sufficiently large. Fix any K G int(@(N_(R, c ))) and any A > 0. We 
begin by building a rate-lZ stacked solution S(N(7lr.)). By Theorem^ there exists a sequence of stacked 
solutions S_(M_(TZc)) of some fixed blocklength n (independent of N) but increasing stack size such that 
Pr(W 7^ W_) < 2~ NS for all N sufficiently large. Fix such a sequence of codes. 



Since TZc G int(l%(C)), A > 0, and n are fixed, there exists a sequence of channel codes {(a^, /3/v)}°° 



N=l 

for channel C with encoders a N = (a$ : (i) G Vi), decoders Bm = (f3 { ji i} ^ B),j : ({i},B) G M,j G B), 
and average error P e < A/(2n) for all N sufficiently larger For reasons that are explained below, we 
may wish to use different channel codes at each time t. We therefore use notation (a/v,t, 0N,t) f° r the time-t 
channel code, t G {1, . . . , n}. 

We now build a solution S(J\T) for N-fold stacked network J\f . Solution S(M) operates S(M_(TZc)) across 
M by channel encoding X t 1 before transmission across C and channel decoding Y_ t 2 before use in the node 
encoders and decoders ofS_(MiJZc ) ) ■ Precisely, at time t node v applies the node encoders from S_(M_(TZc ) ) 
as 

where Y_ t is the network outputYj channel decoded (if necessary) as 



Y 



(«) 



Mi}^B),v, v (v,l), . (UX R x KA R \ v (v,2) 



PWC ' (X-t) ■ ({i\,B) GM,vGB ,yr ; ifv G V 2 



r(«) 



Y_ t otherwise. 

Node v then applies channel encoder a^,t (if necessary) as 

{/ (v) /^M)x y(v,2)\ ., ^ T , 
~(v) 
X_ t otherwise, 

and then transmits across the network. At time n, node v applies the decoder from S^(N_(7lc)) to gi ve 

K (^v) = W (U -* V) (vf ,...,£?, W^ ; . . . ; wiv^rn) ) . 

To bound the error probability, note that two things can go wrong. Either the channel code can fail at one 
or more times steps or all channel codes can succeed but the code can fail anyway. If the channel codes 
{(&N,t, PN,t)}t=i all succeed, then the conditional probability of an error given W = w is precisely what it 

2 We here divide by n since the channel code will be applied across the layers of the stack n times, once for each t £ {1, . . . , n} 
for this blocklength n code. Application of the union bound then gives an error probability over these n time steps. 
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would have been for the original code. Let E t denote the event that the channel code (ajv,t, &N,t) employed 
at time t fails. Then we bound the error probability as 

Pr(W^W) < ^Pr{E t ) + ^Pr(W^W\W = wnn 1 t =1 EZ)p(wr)r)? =1 Et) 
t=l w 

(b) 

< 

which is less than A for all N sufficiently large. Inequality (a) follows from the union bound. Inequality (b) 
follows from the channel code's error probability bound and the observation that p(w n C\f =1 Ef) < p(w) 
for all w. Bounding the channel code's expected error probability in (a) is slightly subtle since the capacity 
definition guarantees only that the code's average error probability goes to zero. An argument suggested 
by ifTil/ . reproduced as Lemma [77] in Appendix^ shows that, under careful choice of the channel code's 
index assignments, each channel code (ajv,t, @N,t) can achieve an expected error probability no greater than 
the code's average error probability A/(2n). Since the channel input distribution may vary with time, the 
channel code (or just the channel code's index assignments) may likewise need to vary with time. ■ 

Remark 1 The family of lower bounding models described in Lemma [3] is tight in the sense that there 
exist networks J\f for which the closure of UTi c £gg(c)&(J\f(1Zc)) is precisely equal to M{M). This 
observation is immediate since network J\f can be the channel C in isolation. Thus Lemma [3] does not 
necessarily give a tight capacity bound for all networks that employ channel C, but we cannot hope to 
increase the rates in this model and still obtain a lower bound for any network that contains C. 

Just as Lemma [3] derives lower bounding models by showing that channel coding can be used to emulate 
a collection of noiseless bit pipes across a noisy channel, Theorem 01 below, derives upper bounding 
models by showing that lossy source coding can be used to emulate a noisy channel C across a bit-pipe 
model C(lZc). Specifically, we prove that &{N) C M(M(lZc)) by showing that we can run a solution 
S(J\T) across network M_(TZc) with similar error probability if the source code can emulate the channel 
to sufficient accuracy. We therefore begin by defining source codes to run across a generic bit-pipe model 
C(lZc). The source codes introduced here differ from traditional source codes in that a good reproduction 
of X_ t x is not a value X_ t 1 that reproduces it to low distortion but a value Y_ t 2 that is similar statistically 
to the output that would be observed if XY 1 were transmitted across N independent copies of C. We 
therefore call the codes channel emulators and measure performance as emulation accuracy. 

Definition 12 A random (2 NUc ,N) emulator C = (a N ,fi N ) for channel C = (X Vl ,p(y V2 \x Vl ),y V2 ) 
under channel input distribution p(x Vl ) defines a distribution over the family of possible encoders a^ = 
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(a { ^ B) : (A,B) e M) and decoders (3 N = (J3$ : j G V 2 ), where 

{A,B)£M:jeB 

While any instance of code (ajv, ($n) 1S deterministic, the distribution over codes establishes an emulation 
distribution 

p(y V2 \x Vl ) = Pr(^(«Jv(x Vl )) = y v >)- 

For any v > 0, we define error probability P e (u) as 

wiere, as usuai, p(x yi ) = EiXiK^ 1 W) andp(y V2 |x Vl ) = n^iP^^)!^ 1 W)- 

Definition 13 The emulation region <^(C) of channel C is the closure of all rate vectors IZc such that for 
any input distribution p(x Vl ), any constant v > 0, and all N sufficiently large there exists a sequence of 
(2 N1lc , N) emulation codes (cxn , Pn ) with Pe (u) < 2~ V ( U > N for some positive function ij(u) dependent 
on p such that r](u) approaches Oasv approaches 0. 

Theorem |4l below, demonstrates that the standard of accuracy used to define the emulation region is 
sufficient to guarantee that C(1Zc) is an upper bounding model for C. Whether this condition is also 
necessary remains an open problem. 

Theorem 4 \PR C G int(£(C)), thenC C C(K C ). 

Proof. Fix rate vectorlZc £ int{S{M)), and consider a pair of networks 

M = (X Vl xX- v \p(y V2 \x Vl )p{y- V2 \x- Vl ),y V2 xy- v >) 
M{K C ) = (X Vl xX- v \ P (y v *\x Vl )p{y- V2 \x~ Vl ),y V2 xy- v >). 

Next fix 1Z G int(&(N)). The argument that follows shows that 1Z G M(M_{JZc))- This suffices to prove 
the desired result by Theorem{l\and the closure in the definition of&(N{!Zc))- 

Step 1 - Choose code S(Af) and define distribution pt{x Vl , y V2 ): 



By TheoremUl there exists a solution S(J\f) of some finite blocklength n from which we can build a 
(2~ NS ,TZ) -S_(A[_) stacked solution for N-fold stacked network N_ for all N sufficiently large. Each stacked 
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solution applies a random channel code to each message W} u ^ v > and then independently applies S(J\f) in 
each layer ofM. For each t G {1, . . . ,n}, letpt(x Vl ) be the input distribution to channel C at time t under 
solution S(M). Thenp t (x Vl ,y V2 ) d = l\^ =1 p t {x Vl {£))p(y V2 {£)\x Vl (£)) is the time-t distribution across the 
N copies of channel C in network M_ using solution S(M). 

Step 2 - Choose channel emulators and bound the probability of emulation failure: 

For each t e {1, . . . , n}, choose u{t) > to satisfy 

t-i 



J>(0 < r, t (is(t))/2 Vt€{l,...,n} 
t'=i 

J>(0 < 8/2, 



t=i 
where rjt(-) designates the function rj corresponding to channel input distribution Pt(x Vl ); these parameter 
choices make the error probability vanish in Step 5, below. We meet these constraints through the following 
sequence of parameter choices. First, set v{n) = 5 /(4n). Then, in order of decreasing t for each t < n, set 
u{t) = mm{5/(An),mm tl>t r] t ,(ij(t r ))/(At')}. 

Since IZq £ int{t§{C)), for each t G {1, . . . , n} there exists a sequence of (2 N1lc , N) random emula- 
tion codes Cnj = {<XN,t, PN,t) that emulate channel C under input distribution p t (x Vl ) with probability 
P^J\v(t)) < 2~ Nr)t ^ v ^ for all N sufficiently large. Let p N:t (y V2 \x Vl ) by the emulation distribution for 
Cn,u an d define 

C (N) dcf | £ V 1 .^ )t(( ^) ) c| £ V 1)>2 -^ M t))/2J j 

where for any setS C X Vl x y v \ 

p t {s\x^) d = E My V2 k Vl )- 

y_ v 2:(x v i,y_ v 2)eS 

To boundp t (Cl N >) = Y^ x v 1&c m Pt(% Vl )> note that 

2 -NthW)) > J2 Pt{x v ')p{(A^ ) ) c \x v ^)+ J] Pt(x Vl )p((4 N) ) c \x l 



- v " '» > e pt(x Vi )p((4 N) rk Vi ) + E pt(x Vi )p« vA]Y ^' x > 

> 2-^("W)/ 2 E Pt(M Vl )+0- E Pt(£ Vl "> 

giving pt{c[ N) ) < 2-*M"(*))/ 2 . 



^iec t "" ^i^c t (JV) 
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Step 3 - Define solution S(N(R C )): 

Let S(N_(lZc)) be the code that results from operating solution S_(M_) across network J\f(lZr.) with the 
aid of emulation codes {(a/v,t) ^N,t)}t=i- Formally, for each v G V , let Y_ t denote the network output 
received by node v at time t. At time t, node v applies the node encoder from S_(M_) to obtain 



X 



(v) 



X[ V) (Z? > ■ ■ • » Y4-1 > W. (v ^ 1] ,--., W^ m ^ ) ; 



here Yj is the channel output Y_ t decoded (if necessary) as 



n 



M 



(pp t oi v '\y {v ' 2) ) *vev 2 



«>(«) 



Y 



.(«) 



Node w then encodes X\ (if necessary) to give 



X) 



((aSf'^^BC^.lM; 









Wr D, (Xr' 1> :v'£A):BCV 2 ) 



X 



(v) 



otherwise. 

ifv £ Vi 

ifv = v A for some A C V\ 

otherwise, 



which it transmits across the bit-pipe model. After time n, node v applies the decoders from S_(M_) as 



(u->v) = friur-w) ^.(„) ^ ^ ^) ^ ^(„i) ^ ^ j^(^m) X 



W 



Solution S(J\f(lZc)) is not a stacked solution since each (otN,t, @N,t) operates across the layers of the stack. 

Step 4 - Characterize the statistical behavior of S (M (TZr.)) : 

Under the operation of S(J\f) on M_, the joint distribution on messages w, network input vectors x™ = 

(x.1 , • • • , x n )> network output vectors y n = (y , . . . , y ), and message reconstructions w is 



p(w,x n ,y n ,w) =p(w) 



n 



p{x t \y 



t-i 



w 



Wp{y^ 2 \^)v{y t V2 \x t Vl ) 



.t=i 



p(w\y n ,w) 



where x t and y again represent the full vectors of network inputs and outputs at time t; p(w) is the 
distribution on messages; each p(x. t | y*~ x , w) is a product distribution describing the independent operations 



Fiiji- 



performed by the node encoders at time t; p{y^ 2 \xj 1 )p(y 



,-Vi 



-Vi' 



describes the memoryless network 



distribution; and p(w \y n , w) is the product distribution describing the independent operation of each node 
decoder. Only the channel distribution changes when we run S(J\f(1Zr.)) on NiJZc), giving 



P(w,x n ,y n ,w) 



P(W) 



n^feiz* i '2e) 



i=l 



Ylptiy^ht^piit ^ht Vl ) 



t=\ 



P{w\y n ,w). 



Step 5 - Bound the expected error probability: 

The following error analysis relies on both probabilities resulting from running 5(AQ on M_ and probabili- 
ties resulting from running S (M (TZr. ) ) on J\f(7Zr). We use Pr(-) for the former and Pr(-) for the latter. 
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Let 



B 



(N) def 



{(x\ 



y 2 



y v >) : Pr (w + w\ (X?,Y?) = {x V *,y V >j) > 2~ m ' 2 } 



(4) 



denote the set of input-output pairs on channel C at time t that are most likely to lead to errors in the oper- 
ation ofS(J\Q on M. The following error probability bound treats (Xj 1 ,Y^ 2 ) a[ N) and (Xj 1 , Y^ 2 ) G 
B t for any t G {1, . . . , n} as error events. We therefore define 

r - f //•„ Ar \ . („Vi q y 2 \ , 4 (iv) > R (iv)-, 

and bound the expected error probability of code S(J\f(lZr.)) as 

n n 

Pr(w^w) < ^Pr(n t , <t G t ,n(4 N) ) c ) + ^Pr(n t , <t G t ,n4 N) nBi N) ) 

t=l t=l 

+Pi(n v < n G t >n{w^w}). 

To bound the first two terms in the sum, note that for each x^ 1 G X^ 1 , 

Pi(n t '<tGt'n{x^=x Vl }) 



{w,^- 1 ,y t - 1 ,x; Vl ):(x^ ,y^)£G t ,W <t 



t'=l 
t 



X., 



t'=\ 
t-1 



n^^Wd^i^jp^i^ 1 : 



t'=l 



i , *-l\ 



(w,2E t - 1 ,y*- 1 ,xr v ' 1 ):(x^ 1 ,j/J?)6G t /Vt'<t -'" ! 

< 2 iY nr.xo Pt( ^ 1) 

since xj 1 G G t implies xj 1 G y4 t (A ° which implies p(y^\x^) < 2 Nu< - t '^p(y^\x^ 1 ). Thus 

Pi(n t , <t G t ,n(Ai N) ) c ) 

a^ieC t (JV) :r^C t (JV) 

< 2 - JV M i/ (*))/ 2 -5X 1 i "(*')) + 2 -^m*))/ 2 -iX 1 i "(f)) 

by the definition ofC\ and the bound on pt (C\ ) from Step 2. This sum goes to zero as N grows without 
bound by our earlier parameter choice. 



To bound p t (C\ t > <t G t > n a[ N) n B { t N) ), recall thatS(J\T) is a (2- N5 ,K) solution and there are fewer than 
m 2 messages to transmit. Thus Pr(W_ ^ W_) < m 2 2~ NS for solution S(J\P) by the union bound, giving 

m 2 2' NS > Pv(W / W) 

> Yl pt & Vl ' i V2 ) Pr Q£ ^ ^i- Fl ' y V2 ) 

(z v i,j/ v 2) e _B< N > 

> 2-^/ 2 Pt ( J Bf ) ), 
giving Pt (B { t N) ) < m 2 2- NS / 2 . Thus 

Pr(n t , <t G t ,n4 N) nBi N) ) < 2 N ^>^ 



E 



(a;Vi, 2/ v 2 ) eA <«) njB (") 



E 



Pi(x yi )p t (y y2 |^ yi ) 



(^i,^2) e A< N> ns t <JV) 



? (A0> 



which also goes to zero by our choice ofu(l),..., v(n). 
Finally, 

Pr (n? =1 G t n{f/f} 



(a) 

< 






,u; 



,t=l 



t=l 



< 2 7V ^"=^ (t) 



£ 



p(wz,x n ,y n ,^) 



tfEJUK*), 



Il 2/ " '""/>/iy,,|x/i 



p(*|u;,x n ,y n ) 



(^i, ?/ l '2) e A< JV) \B< JV) 



(c) 



< 2 -iv(«/2-Er=i "(*)) . 

Equation (a) follows from our probability characterization in Step 4 since (xj^yY 2 ) G Aj for all t by 



(N) v R (JV) 



definition ofGt- In (b), we bound the sum over A\ \B\ by the sum over all X_ x x 3^ 2 for allt > 1. 
Equation (c) follows from the definition ofB t in ©. This term aiso goes to zero as iV grows iarge by our 
choice of u(l), . . . , v(n). Since the expected error probability for our randomly drawn code can be made 
arbitrarily small there exists a single instance that does at least as well. Thus 1Z e M(N(7lc)). ■ 
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Remark 2 Lemma [2] can be used to show that Theorem [4] also holds for all points IZc on the outer 
boundary of ${N) with strictly positive coefficients. It is not clear whether it holds for all boundary 
points since Mr is not known to be continuous at R = for general networks lfT2l . Ifl3l . 

V. Upper Bounding Models 

While existing achievability results for individual channels lead immediately to lower bounding networks 
(see Lemma[3]), capacity upper bounds do not generally give legitimate upper bounding networks. Roughly 
speaking, there are two causes of this phenomenon. First, capacity upper bounds for multi-input channels 
(|Vi| > 1) assume independent transmissions from their input nodes; when the channel is used within 
a larger network, the inputs may be statistically dependent. Second, capacity upper bounds assume 
reliable transmission across each component channel; operating individual channels above their capacities 
sometimes increases the network capacity, as shown in Examples Q] and [2 By Theorem 01 we can build 
upper bounding models by finding points in the emulation region described in Definition [T2J 

We here derive example upper bounding models for the broadcast, multiple access, and interference 
channels. All of the results use the bit-pipe models defined in Section [TTlJ removing bit pipes of capacity 
0. Recall that for each A C V\, internal node v A receives a noiseless description of channel inputs 
(X^' 1 ' : v G A). These noiseless descriptions are transmitted along internal edges of capacity log |Af^|, 
as described in the model definitions; log | A*^ | is infinite when X^ is continuous. In Section IVTl we 
bound the accuracy of capacity bounds derived using these models for a variety of example channel 
types, including channels with continuous alphabets. 

This section derives general form solutions. Examples for specific channels appear in Section [VT] Each 
result describes a family of upper bounding models both because multiple rate vectors satisfy the given 
bounds and because switching the roles of the nodes in asymmetrical solutions may yield new bounds. 
Taking the intersection of the rate regions corresponding to different bounds may yield a tighter bound. 

Given a broadcast channel with transmitter V\ = {i} and receivers V2 = {ji, J2}, Theorem [5] derives an 
upper bounding model of the form shown in Figure Oa). 

Theorem 5 Let 

C(K e ) = (^ (i ' 1) ,p(y (j ' 1 ' 1) ,y (i2 ' 1) |x (i ' s) ),3> (j ' 1 ' 1) x3>^' 1 )) 
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\og\X^)\/ R{{llM ^ m 



Fig. 5. Upper bounding models for the (a) broadcast channel (Theorem [5}, and (b) multiple access channel (Theorem [6}. 

be a broadcast channel and its corresponding bit-pipe model for some IZq satisfying 

jj(«-KjiA}) + flttM^}) > jp^vQ.yO'i.l^yO's,!)) 
for all distributions p{x { - i ' 1 \y^ 1 ' 1 \y^^) = p{x { ~ i ^)p{y^' 1 \y^^\x { - i ^). ThenC C C(72c). 
Proof. See Appendix \ffl\ ■ 

Theorem [6] derives an upper bounding model of the form shown in Figure 12b) for a multiple access 
channel with transmitters V\ = {«i,i2} and receiver V2 = {j}- 

Theorem 6 Let 

C(1Z C ) = (X^ 1 ^ xX^' 1 \p{y ij ^\x {il ' 1 \x^^),y U ' 1) ) 

be a multiple access channel and its corresponding bit-pipe model for some IZe- If for each distribution 
p{x^ ll)1 \x^ l2,1 >) there exists a distribution p(u\x( l1 ' 1 ') on an alphabetic with \U\ < IX^ 1 ' 1 ^ such that 

thenC CC(K C ). 
Proof. See Appendix]!} 



Let C = (X^* 1 ) x X<frA) i p(yfrA) ) yfa,i)\ x fr> 1 ),x( i »' 1 )),yV 1 > 1 ') x y^^) be an interference channel 
with transmitters V\ = {11,12} and receivers V2 = {ji,J2}- Theorems [7] and [8] derive upper bounding 
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log 1*^1 - 1 ' | 



log l^t'2. 1 ) 




-J R (V 1 ^V 2 ) 



R {{il)^V 2 ) 

H«H}-K32}) 

logjA^l' 1 ) 



;|^<'2,1) 



»2 J2 

(a) Model 1 

Fig. 6. Upper bounding models for the interference channel. 




R (V 1 ^V 2 ) 

R (V 1 ^{j 2 }) 



(b) Model 2 



models for C of the forms shown in Figures 0a) and (b), respectively. In the first case, node i\ transmits 
two descriptions, one to just j\ and the other to both receivers. Node v Vl noiselessly receives both channel 
inputs and transmits one description to j\ and the other to both receivers. 

Theorem 7 Let 

C{1Z C ) = (X^'V x X ii2 ' 1 \p(^ jl ' 1 \y {j2 ^\x iil ' 1 \x il2 ' 1) ),y {jl ' 1) xy {j2 ' 1] ) 

be an interference channel and its rate-He bit-pipe model. If for each distribution pix^ 1 ' 1 '^^' 1 ') there 
exist conditional distributions p(u2\x <y%1 ' 1 ^) andpiuilx^ 1 ' 1 ' ,u 2 ) with \U\ x U 2 \ < lAf^ 1 ' 1 ^ and 

#({*}-►{*}) + #H«'*}) > I(X^^;U 1 ,U 2 ) 

R ({ii}^{jnh}) > I(X^'^;U 2 ) 

ft({ii,ia}-HJi}) _|_ RiiiiM-tiiuh}) > jfx^ 1 ' 1 ^ X^'^-Y^'-^lUi U-i yC? 2 ' 1 )) 

+I(X^ 1 \x( i2 'V ] Y&>V\U 2 ) 

Jl({ii,ia}-¥{jija}) > I(X^ iul \X^ a,1 ^-Y^ 2,1 ^\U2) 

thenC QC(K C ). 
Proof. See AppendixW\W 

In the second bit-pipe model for the interference channel, node i\ again transmits two descriptions. Here 
the first is delivered to both receivers while the second is delivered only to j 2 . Node v Vl noiselessly 
receivers both channel inputs and transmits one description to both receivers and the other only to j 2 . 
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Theorem 8 Let 

C = (X^'V x X^' 1 \p(y (jl ' 1 \y^^\x^' 1 \x il2 ' 1) ),y ijl ' 1) xy^ 2 ^) 

C{U C ) = {X {il > 1] x X (i2 ' 1 \p(y^' 1 \y^^\x {il ' 1 \x {l2 ' 1) ),y Ul ' 1) xy {J2 ' 1] ) 

be an interference channel and its rate-IZe bit-pipe model. If for each distribution p(x^ il,l \x <y%2 ^) there 
exist conditional distributions p(ui\x^ 11 ,1 * > ) and p(u2\ui, x^ 1 ' 1 " 1 ) with \U\ x ZY2I < lAf^ 1 ' 1 ^ for which 

#({ii}->{ii,J2}) > HxM^Ui) 

/j(ftH{jiA}) + fl ({hH{J»}) > I(X^V-U U U 2 ) 

_^({ii,j2}^{jij2}) > j(x^ i ' 1 ^x^ 2 ' 1 )-y^ i ' 1 ^|c/i) 

thenC CC(72c). 
Proof. See AppendixWUM 

VI. Bounding Accuracy 

The equivalence tools derived in Section [IV] yield upper and lower bounding models for a single indepen- 
dent channel C. Repeated application of these tools on networks containing multiple independent channels 
allows us to bound the capacity of a network of noisy channels by bounding the capacity of another 
network in which some or all of the network's stochastic components have been replaced by bit-pipe 
models. To make this precise, let M be a network containing some collection A of independent channels. 
Then for any K L = (JI c ,l ■ C G A) G UcgA^( C ) and an y n u = (Kc,u ■ C G A) G \[ C&A S{C), TZ c ,l 
and lZc,u describe lower and upper bounds for C (i.e., C(1Zc,l) Q C Q C(1Zc,u)) f° r each C G A. Let 
M(1Zl) denote the network obtained by replacing each C G A by its lower bounding model C(1Zc : l) and 
J\f(lZjj) denote the network obtained by replacing each C G A by it upper bounding model C(Hc,u)- 
Then Lemma [3] and Theorem @] imply 

&l{M) C^(A0 C&uW, 
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where 

7^eint(n ce ^(C)) 
The discussion that follows finds multiplicative and additive bounds on the difference between &l{N) 
and f%u(N), thereby bounding the accuracy of Mi,{M) and &u{N) as approximations for M{M). 

Lemma |9l below, shows that there exists a constant a G [0,1] such that 1Z G My{M) implies aR G 
Ml{N); we henceforth use notation 

to specify this relationship. Lemma |9js strength is that it applies to all demand types and does not 
increase with the network size; its weakness that constant a is determined by the worst-case channel in 
A. The following definition is used in that result. Recall from Section [III] that the models for vectors 

TZ c ,l = {R { £l B) : ( A > B ) e M) € m{C) and K c ,u = (R^u^ '■ ( A > B ) e M ) e g ( C ) wt identical 
in their topologies (except for possible missing edges corresponding to rate-0 entries in 1Zc,l or TZc,u)- 
We can therefore define the worst-case ratio between individual edges of these models as 

tp-, def . a C,L 

p{C) = sup mm — j- — ^y. 

{n c ,L,Kc,v)£!%{C)xint{g(C.)) (A,B)eM: R ( c A u B) >Rc A l B \ RcV B) >0 R? ^ ' 



Lemma 9 



i(A0> 



rain p(C) 

C&A 



@u{M) 



Proof. Let a = minced p{C), and for each C G A fix some sequence {(lZc,L,k, ^c,l/,/c)}fcLi suc ^ that 
(Kc,L,k,Kc,U,k) e &(C) x int(S(C)) for all k and ratio 

jj(A->B) 

def . ^CL.fc 

ac,k = mm , . 

(AB)^^ B1 >^f, R { c A ^ B) >o 4 ^ fc 

monitonically approaches p(C) as k grows without bound. LetJ\fL,k, Nu,k> andN ak u,k be the networks that 

result when each channel C G A is replaced by bit-pipe model C(lZc,L,k)> CCR-c,u,k)> and C(a>k7lc,u,k)> 

respectively, where a^ = vainer a a c,k- Then 

@{Ma k u,k) c 3?(Af L ,k) c &(Af) c M{Nu,k) 

since a,kR-c,u,k < T^C,L,k for all C by definition of a^. Network Af ak u,k is identical to network Mjj,k 
except that the capacity of each bit-pipe model edge has been decreased by factor a^. We next employ 
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c 



#0 = 1- H(pi *pi*p 2 * p 2 ) 
R\ = H(pi *pi* p 2 ) - H(pi) 

#o = l- H(pi *p 2 *p 2 ) 
R l =H{p l *p 2 )~H{p 1 ) 



(b) 



C 




R' = l-H{ Pl *p 2 ) 

R[ = H(pi * px * p 2 ) ~ H (pi) 

% = l-H( Pl * P2 ) 

R 1 = H(p l *p 2 )-H(p l ) 



Fig. 7. Example upper and lower bounding models for the binary symmetric broadcast channel with error probabilities p\ 
and p\ * p2 at its two receivers. The bit-pipe capacities given in (a) and (b) correspond to the independent noise and physically 
degraded cases, respectively. 



Theorem Q] to bound the difference between &(N ak u,k) an d M{Nu,k)- Let Mrjk be the N-fold stacked 
network for Mu,k and let M ak i/ k be the \N/a,k]-fold stacked network for Af ak u,k- We can operate any 
(7£, A) solution S(M v k ) forM Uk across network Af cik Uk as follows. For each C G A, transmit the NlZc,u,k 
bits intended for transmission across N copies ofC(lZc,u,k) across the \N/a k ~\ copies ofC(a k 1Zc,u,k) m 
Nia k Uk- Transmissions across the remainder of the network are sent unchanged. Applying S(M_ Uk ) across 
NLa k Uk m this wa Y delivers N1Z bits over \N/a,k] layers with error probability A. The rate N1Z/ \N/ak\ 
approaches a k 1Z as N grows without bound. Letting k grow without bound achieves the desired result. ■ 



By (9j Corollary 5], the best upper and lower bound for any memory less point-to-point channel are the 
same. Thus p(C) = 1 for memoryless point-to-point channels. The following examples bound p(C) for 
binary broadcast and multiple access channels with additive noise. 

Example 3 Let C = ({0, l},p{y^ 1 ' x \y^ 2,l ^\x^ ,l >), {0, l} 2 ) be a binary symmetric broadcast channel. 
Then yCii.i) = x&V © Zi and Y^^ = X& 1 ) © Z 2 as shown in Figure [7] Let Pl = EZ X and 
Pi * P2 = Pi(l — P2) + ^2(1 — Pi) = EZ2. Figure |7] shows example bounding networks. The lower 
bounding models correspond to points (Rq,Ri) = (1 — H(a * p\ *p2),H{a * p\) — H(p\)) on the 
boundary of the capacity region. The upper bounds are obtained by evaluating Theorem [5] Thus 

— 1 v > i*P 1 *J : ' 2 * P2 > when the noise at the receivers is independent 



P(C)> 



i-H(* P ** P ) when the noise is physically degraded, 
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c 



c 



R t = a(l - H(p)) 

R 2 = (l-a){l-H(p)) 



R[ = l- H{p) 



Fig. 8. Example upper and lower bounding models for the binary adder multiple access channel with error probability p. 

Ri 




c 




Ri=R2 = \(l-H(p)) 
R 3 = l- H(p) 

Fig. 9. A variation on the lower bounding model from Figure [8] 



C 




R[ = 1 - H(p) 



where the bounds are achieved by setting a = pi * p2 and a = p2, respectively. Observing both Y\ and 
Y-i gives more information when Y\ and Yi are independent, so piC) is smaller in that case. ■ 

Example 4 Let ({0, l} 2 ,p(y^' 1 )|rE^ 1,1 \x^ 2 ' 1 \{0, 1}) be a binary adder multiple access channel with 
yu'ii) = x^ 1 ' 1 ' © X^ 2 ' 1 ) © Z. Let i?Z = p. Figure [8] shows lower and upper bounding models. Each 
lower bounding model comes from a point on the capacity region. The upper bound evaluates Theorem [6] 
with U = c. The models for this example are quite intuitive. For example, any code designed for network 
J\f can be operated on the given upper bounding model by implementing a memoryless binary adder at 
the central node. In this case, the topologies of our upper and lower bounding models do not match, but 
they can be modified to match as shown in Figure [9] Thus 

, (c) > lz£M 



Additive bounds are an alternative to the multiplicative bounds described above; this approach may be 
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particularly useful when R c L = for some (A, B) G M. such that R c „ > or when lZc t u 
incorporates infinite capacity edges for some C G A. We here restrict our attention to upper and lower 
bounding networks that are entirely deterministic - that is, we assume that the network is comprised of 
independent channels that have all been replaced by noiseless bit-pipe models. We also focus on demand 
types for which cut-set bounds are tight on networks of noiseless links. These include multicast demands, 
multi-source multicast demands, non-overlapping demands on single-source networks, and two-resolution 
multicast demands on single-source networks (see, for example, [14]). 

Let & C (N) be the set of achievable rate vectors for demand types where cut-set bounds are tight on 
bit-pipe networks, and define 

«cl(aO = (J ^c(N{n L )) 

&cuw = n &cWRu)). 

7^dnt(n ceA r<?(C)) 



For any b > 0, we use 



cl (A/") > M CiU {M) - b 



to specify that 1Z <E M c ^j{M) implies [11 - 6(1, . . . , 1)]+ E M C ±{N). That is, for any 1Z g M cfJ {M), 
reducing the rate for each demand by b yields an achievable rate vector from S% C ^(M). For any network 
M of noiseless bit-pipes and any S C {1, . . . , to}, define val(7V, S) to be the sum of the capacities of all 
bit pipes with input in S and output in S c . Since bit-pipe models incorporate internal nodes not present 
in the original network (and therefore not present in the cut-set definitions), we define the value of a cut 
across a bit-pipe model using the assignment of internal nodes that minimizes the cut's value. To make 
this precise, again let V = {v A : A C V\, \A\ > 1} be the set of internal nodes for bit-pipe model C(lZc) 
for channel C. For any CgJV and S C {1, . . . , to}, define 

def f min S/= suT:Tcv val(C(72c), S') if S n V 1 + and S c n V 2 ± 
val(C(72c),5) = < 

I otherwise. 

Finally, define A(C,S) as 

A(C, S)= min [v a \(C(K c ,u), S) - val(C(^ c ,x), S)]. 

( l<-c , L ,i\-c .c/jS^siCJxt^C) 

Lemma 10 For any network Af and any set S C {1, ... , m}, 

^cl(AA) > ^ C ,c/(AA) - max V A(C, 5) 

5c{i,..„m} CeAr 
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Re) 



llog( 



c 



(P+JV 2 /a 2 ) 2 (l-p 2 ) 




{V7h/a 2 -py/N~ 1 /a 1 YP+(l-p^)(P+N 2 /al)N 2 /al 



Ri = i^ + wk {j ii 



c 



Rn 




2 l0 g(l + ]^|) 



-p) 2 (P+iV 2 /a2) 



K l - 2 ^V 1 + ATj/a 2 (l-p) 2 (P 



+^2/^) 



Fig. 10. Example models for the Gaussian broadcast channel. 

Proof. Since cut-set bounds are tight for the given demand types by assumption, we bound the difference 
in capacities by bounding the difference in each cut-set using the best choice of the upper and lower 
bounding models for each cut. ■ 

Given bounds on A(S,C) for some family of channels, Lemma [TOl yields immediate bounds on the 
accuracy of the capacity bounds resulting from our models. These bounds take the same form as prior 
bounds in the literature (e.g., [15]). In particular capacity bounds resulting from our upper and lower 
bounding models differ from each other (and therefore from the true capacity) by a constant multiple of 
the number of channels in the network. For networks of Gaussian point-to-point, multiple access, and 
broadcast channels with independent noise at the receivers, this constant is bounded from above by 1/2, 
as shown by the examples that follow; the resulting capacity bounds agree precisely with 1031 for unicast 
and multicast demands. The result here extends to other demand types where cut sets are tight, to tighter 
bounds outside the high-SNR region, and to corresponding results for networks containing broadcast 
channels with dependent noise at the receivers. 

By |9), A(C,S) = for all memoryless point-to-point channels. Example [5] bounds A(C,S) for the 
Gaussian broadcast channel. 

Example 5 Let C be a two-receiver Gaussian broadcast channel QR,p(y^ 1 ' 1 \y^ 2 ' 1 ^\x^ i ' 1 ^),lR 2 ) with 
yOi* 1 ) = aiX^' 1 ) + Z x and F^' 2 ' 1 ) = a 2 X { - i ^ + Z 2 for some jointly Gaussian random variables Z\ 
and Z 2 with ^[(X^ 1 )) 2 ] < P, E[Z\\ = N lt E[Z%] = N 2 , E\Z X Z 2 \ = p^h^h, and Ni/aj < N 2 /a% 
Figure [10] shows example upper and lower bounding models. The lower bounding model is found by 
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Z ~Af(0,N) 



c 



c 




i? 1 = ±log(l + £) fli = !log(l + $) 

** = \ Ml + A) # 2 = I log(l + ^g^) 



Fig. 11. Example models for the Gaussian multiple access channel with power constraints Pi > Pi at transmitters 1 and 2 
and variance-iV Gaussian noise. 



evaluating the broadcast capacity bounds 

Pi 



R-2 



1 / (l-a)P 

1 / aP 

2 [ ° g { l+ (l-a)P + N 2 /al 



at 



1 — a 



{^/N~2/a 2 -p^fN~ l /a l f 



(i-p)2 (a 2 P + iS r 2 ) • 

The upper bounding network is obtained by evaluating the model from Theorem [5] This upper and lower 
bound imply 



A(C,5) < ilogfl + — ^ 



2 ' 2 



log 



(P + iVa/a 2 )^!-^) 



/ 



log 



1 + 



P 



(v^M - p^Nl/cnfP + (1 - p 2 )(P + N 2 /a 2 2 )N 2 /a 2 2 

2 



1 /, / JM 

1 -P\l7hM 



V 



P + N 2 /a? 2 1 - p 2 



When Z\ and Z 2 are independent, p = and the upper bound is 



A(C,S)<ilog( 1.+ 



P 



P + 7V 2 /a 2 ' ' 



which is at most 1/2 and signficantly smaller in the low SNR region. ■ 

Example 6 LetC = (M 2 ,p(y^'' 1 )|x^ 1 ' 1 \x^ 2 ' 1 )),IR) be a Gaussian multiple access channel with yC?'- 1 ) = 
X^- 1 ) + X^ 2 ' 1 ) + Z, ^[(X^ 1 ' 1 )) 2 ] < Pi, ^[(Xfe' 1 )) 2 ] < P 2 , Pi > P 2 , and Z ~ Af(0,N). Figure [FT] 
shows upper and lower bounding models for the given multiple access channel. The lower bound is 
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chosen as the corner point 



ft = ^ g (l + # 



2°V N J 2°\ N J 2°\ A + iV 

of the multiple access capacity region. 

The upper bounding network is obtained by evaluating Theorem [6] under the maximizing joint distribution 
on (X^ 1 ' 1 ', X^ l2 ' l >) using a statistically dependent distortion-D reproduction U of X^ 2,1 ) similar to those 
used in lossy source coding. Precisely, 

V Pi 

U = y(*i.i) + 7 X 

z = z x + z 2 



where Z\ is a Gaussian random variable with mean and variance N/(l + yj ' P2/P1) 2 , Z 2 is a Gaussian 



random variable with mean and variance JV(1 — 1/(1 + WP^/Pi) 2 ), and (X\,X2), Z\, and Z2 are 
mutually independent. Using this choice of U, the upper bound from Theorem [6] is 

ft = il°s(l + § 

R2 = i log Av^+Vft) 2 + Jv 



2 ° V Pi + N 

Using the given upper and lower bounds yields 

A ( C,S)<^ nV7Tl + VW2f + N 



2 ° V P1 + P2 + N 
which is at most 1/2 (and considerably smaller when the signal-to-noise ratio is small). ■ 

Examples [3l HI [5] and [6] show that for some network types, the upper and lower bounds differ by at most 
an additive or multiplicative constant that depends on the statistics of the network's component channels. 
Given any network J\f built from arbitrary point-to-point channels, binary symmetric broadcast channels 
(Example [3), and binary adder multiple access channels (Example |4|, Lemma [9] shows that the capacities 
of the derived upper and lower bounding networks differ from the true capacity and each other by at 
most a multiplicative constant p* = maxcg^/" p(C). This constant depends on the channel for which the 
distance between our upper and lower bounds is largest but not on the size of the nework. Likewise, given 
any network J\f built from arbitrary point-to-point channels, Gaussian broadcast channels (Example [5]), 
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and additive Gaussian multiple access channels (Example [6]), Lemma ITOl shows that for all demand types 
for which cut-set bounds on the network coding capacity are tight, the capacities of the derived upper 
and lower bounding networks differ from the true capacity and each other by at most an additive constant 
equal to a constant multiple of the number of channels in the network. When the noise at the receivers of 
each broadcast channel is independent, this immediately extends the well-known 1/2 -bit per component 
bounds to a variety of other demand types where cut-sets bounds on the network coding capacity are 
tight. It also gives tighter bounds outside the high-SNR region and derives the corresponding bounds 
for broadcast channels with statistically dependent noise at the receivers. Of course, examples [J and |2] 
demonstrate that the lower and upper bounds for some channels are, by necessity, far apart. When such 
large gaps arise, they motivate the investigation of larger network components. For example, modeling 
the network from example Q] not as two independent components but instead as a single component with 
one input and one output yields matching lower and upper bounding models and therefore a precise 
network equivalence. 

VII. Conclusions 

The equivalence tools introduced in this paper are proposed as one step in a new path towards the 
construction of computational tools for bounding the capacities of large networks. Unlike cut-set strategies, 
which investigate networks in their entirety, the approach proposed here is to bound capacities of networks 
by carefully characterizing the behaviors of the individual components from which they are built. As 
described in Lemma [3j the capacity region of an isolated component can be used to calculate lower 
bounds on the capacities of all networks in which the component may be employed. Since capacity 
regions of individual components cannot be used to derive upper bounds (see Example [Q), Theorem [4] 
employs an alternative component characterization - here offered as a complement to the traditional 
capacity problem. Given an arbitrary channel, describe the family of bit-pipe models over which accurate 
channel emulation is possible. The question is essentially a source coding problem - for each vector X Vl 
at the channel input nodes V\, we characterize the family of rate vectors (R( A ^ B ^ : A C Vi,B C V2) 
sufficient for constructing a reproduction Y^ 2 at the channel output nodes V2 such that Y^ 2 appears to 
result from the operation of channel C on input X Vl . The upper bounding models for the point-to-point, 
broadcast, multiple access, and interference channels are here offered as examples of this characterization 
strategy. Increasing the library of component models offers a route to studying capacities of larger and 
larger families of networks using computational tools for bounding network coding capacities. 
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Appendix I 
Average vs. Expected Error Probability in Channel Coding 

Lemma [TQ below, shows that given a blocklength-A r channel code with average error probability P e , 
there exists an index assignment such that the code's expected error probability is no greater than 
Pe . This is obvious for channels with a single transmitter but more subtle for channels with multiple 
transmitters. The outline of this proof was suggested by ifTTT . The property is useful since messages 
transmitted across a channel C in the middle of some large network J\f need not be equally probable, which 
means that the expected error probability can equal the code's maximal error probability if the codeword 
indices are poorly assigned. We denote the average error probability under channel code (ajV)/3jv) as 

and the expected error probability of the same code as 

£ p(x Vl ) Pr (p N (Y v ') / x Vl | X Vl = a N (x Vl )) . 

This notation hides the independent operation of the encoders a^ = (a" t *~*' B ' : ({i},B) G At) and 
the decoders /3n = (/3(w -kB W : ({i},B) G At,j G B). We relabel the codeword indices by applying 
a permutation 0(W^ B ) on each message set. Given permutations <p = (0(i t l _ShB ) ; ({i},5) G At), we 
denote the expected error probability after relabeling the codeword indices by 

£ p(x yi )Pr {p N (Y v *)^ 0(^)1 x_K = «*(#£*))), 

where <^(| Vl ) = (0({*}-+ B ) (£(»-►*) ) : {{i},B) G At). 

Lemma 11 ([11]) Let (a/v, Pn) be a blocklength-N channel code for channel C with transmitters V\ and 
receivers V%. For any distribution p(-) on the space X_ 1 = Y\_iu\ b)gm ^- of possible transmissions, 

there exist independent permutations (f) = (<p^ % ^ B > : ({i},B) G At) of the transmission indices for which 

Y^ p(x v ^Pi {Pn{yY 2 ) / <p(x Vl )\ X Vl = a N (H^. Vl ))) < H N) - 

Proof. For each ({i},B) G At, choose permutation ^(W - ^) uniformly at random from the space of 
possible permutations on VV "*J ^ B ' . Then, using E$ [•] to denote the expectation with respect to the random 
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permutation choice, the expected error probability of the resulting channel code is 



E<1- 



J2 P(x Vl ) Pr {Pn(Y v >) / <S>(x y i)\X v i = a N (^x v ^))) 
i. Vl ex Vl 

J2 p(x Vl )E^ [Pr {Pn(Y V2 ) + <5>{x Vl )\ X Vl = a N {^(x Vl )))] 



(a) 



xYiex Vl 

p(N) 



V -4" Pr (Pn(Y V2 ) / t Vl X Vl = a N (t Vl ) 

z — ' I v 1 1 V 

i \cL 



\_xyi ex 



where (a) holds since all codewords are equally probable under the uniform distribution on permutations. 
The result follows since the optimal choice of permutations (^(W - *- 8 ) : ({i}, B) G M) achieves expected 
error probability no greater than that achieved by the given random permutation choice. ■ 



Appendix II 
Typical Set Notation and Tools 

The appendices that follow define typical sets for many combinations of random variables and many 
parameter values. The following definitions are useful for streamlining the exposition. Given a random 
variable Z drawn from distribution p(z) on alphabet Z and an iV-vector z€Z_, define 



m 



def 



±logp(z)-H(Z) 



clef 



where p(z) == Wi = \P{zitj) and H(Z) is the (discrete or differential) entropy of random variable Z. 
The random variable and distribution are implicit, with f(x) and f(y) referring to random variables X 
and Y, respectively. For example, the usual jointly typical set for (X, Y) is here expressed as 

A[ N) = {(x,y) : fix) < e, f(y) < e, f(x,y) < e}. 

For each collection of random variables for which we define a typical set, we also define a restricted 
typical A e C A e and an indicator function K(-) that equals one for values in A e and otherwise. 
The formal definitions for the restricted typical sets are given in the appendices that follow. When multiple 
restricted typical sets are in use we distinguish between them either by context or by adding arguments. 
For example, (X, Y ) E A\ ' and A\ ' (X, Y) refer to the same restricted typical set. A summary of 
definitions and results from [9] follows. 
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Given any distribution p(u, v) and any constant e > 0, define 

o(e) d = (1 + e) • inf {e' > : p(f(V) > e' V f(U, V) > e') < 2~ me VA suffic. large} 
4 N) = {(u,v) : f(u) < e, f(v) < o(e), f(u,v) < a(e)} 

£(N) def ^^ e ^ . ^ (/( ^ > a(e) v f^y) > a(e) |^ = u) < 2 -JV3,| 

^, v def f i iffey)eif } 
#fey) = < 

I otherwise. 
Lemma 12 ^1 Lemma 6] Let (77, V_) be drawn i.i.d. p(u, v). Then 

p({A{ N \u,V)) c ) < 2~ Nc & 
for some constant c(e) > and all N sufficiently large. Constant c(e) approaches as e approaches 0. 

Design random source code (a/v> Pn) by drawing codewords /3jv(1), • • ■ , &n{2 nr ) i.i.d. p(V) and choos- 
ing ajv(«) uniformly at random from the indices w € {1, . . . , 2^^} for which codeword (3^(w) satisfies 
(u,Pn(w)) £ A e ; aAr(n) is set to 1 if no index w satisfies this constraint. Define 

p(v\u) = Pr((3 N (a N (u)) = v), 

and for any A C U x V, let p(A|u) = Z^u^saPC^Im)- 

Lemma 13 £9l Lemma 9] For any (u, v) e A ( £ , 

P^|u)<p(^|2i)2 7V(4a(e)+2e+1/Ar) . 

Lemma 14 j"9, Lemma 10] 

P 



((AWy\u) < P ((if )) c |u) + e -^<-'^-w-) 



Appendix III 
Broadcast Channels 

We begin by defining the typical sets used in the proof of Theorem [5] That proof appears later in this 
section. We here employ notation and results developed in Appendix HT1 

Given any p(x,yi,y2), fix e = (e\,e2) with ei,62 > 0, and let 

ai(ei) d = (1 + ei) • inf |e' > : p (/(Y 2 ) > e' V f(X,Y 2 ) > e') < 2~ m ^ VAsuff. large} . 
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For p(x,y 2 ) the typical set is 

4 N) = {(x,y 2 ) GXxy 2 : f(x) < e 1} f(y_ 2 ) < ai(ei), f(x,y 2 ) < ai(ei)} , 
which we restrict to 

if > = {(x,y 2 ) e 4"> : p ((4 W) (^>^)) C | x) < 2~ 3 ^} . 

Let 

a 2 (e 2 ) d ^ (l + e 2 )inf{ e / >0:p(/(y 2 )>e / V/(y i ,y 2 )> e , V/(X,y 2 )> e / 
Vf(X,Y.i,X. 2 ) > e ') < 2~ m£2 ViV suff. large} . 
For distribution p(x, y\,y 2 ), the typical set is 

A{ N) = \{x,y v 1 2 ) : f(y 2 ) < a 2 (e 2 ), f(x,y 2 ) < a 2 (e 2 ), f(y v y 2 ) < a 2 (e 2 ) , f (x, y_ v y_ 2 ) < a 2 (e 2 )| 
which we restrict to 

if } = {fey 1 ,y 2 )e4 7V) :p((4 iV) (^^,^)) c |-,y 2 )<2- 3 ^}. 

By Lemma [151 the probability of observing atypical elements is asymptotically negligible. 
Lemma 15 If {X_,Y_ l ,Y_ 2 ) are drawn i.i.d. p(x, y\,y2), then 

p((iW(x,y 2 )) c ) < 2- iVci ( £i ) 
x,yi,y 2 )) c ) < 2~ NC2 ^ 



v 

for some constants ci(ei), c 2 (e 2 ) > and aii iV sufficiently large. Constants ci(ei) and c 2 (e 2 ) approach 
zero as e± and e 2 , respectively, decay to zero. 

Proof. Like Lemma\l^ the result follows from Chernoff's bound and the definition of A e . ■ 

Proof of Theorem |U Since i^W-^-Da}) is not bounded from below, we set it to 0. For concision, we 
further define R = f fl(«-KiW) and Ri d = #(«-♦{*}) and use C = (X,p(yi,y 2 \x),y x xy 2 ) in place 
of C = (X^ i > 1 \p(y^ 1 ' 1 \y^' x )\x ( - i '^),y^ 1 '^ x ^ (J2,2) ) both in this proof and its supporting lemmas. 

Fix (Rq,Ri) to satisfy the theorem constraints. Suppose that R\ > I(X;Yi\Y 2 ); for any rate pair 
satisfying the theorem assumptions but not satisfying this bound, we can operate the code as if this 
condition were satisfied by using part of the common rate to carry private information for receiver j\. 
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By Theorem [4] it suffices to show that for any channel input distribution p(x) there exists a sequence of 
rate-(i?0i Ri) random emulation codes (ajv,/3jv) for which the resulting emulation distribution 

p{y v y 2 \x) = ~Pi(i3 N (a N (x)) = (y v y 2 )) 



satisfies 

> v < 2-^M 






for some positive function r](v) dependent on p[x) for which r](u) goes to zero as u goes to zero. 

We employ the definitions for the (restricted) typical set A e from the beginning of this section. We 
distinguish between these sets either by context (e.g., (x, y ) G Ae refers to the typical set for 
p t (x,y 2 )) or by adding arguments (e.g., A{ N) (X,Y 2 )). Typical sets Ai N) (X,Y 2 ) and A { 6 N) (X,Y 1 ,Y 2 ) 
employ parameters ei and e 2 , respectively. 

Next, we define codes (ajV)/3jv) to emulate the typical behavior of channel C under input distribution 
P(2i) = Ili=i p(x(£)) ■ Recall that (ajv,/3iv) has encoders 

*n = (a^ B) : (A, B) G M^ ~ '-«*>-<*» „(«-{*}) „(«-{*.«) 



at rates i?i = fjtt*}-^}), ^(W^fe}) = 0> and #0 = fltW-^iJa}) and decoders j3 N = (f3^ l] ,/3 ( ^ 2) ) 



Rate requires no encoder. We abbreviate the notation for the remaining encoders to a N = a]^ 
and a$ = a$ } ^ {jl,h}) and for the decoders to /3^ l} = $ and 13^ = (3$. Thus 

a$ : *^Wi /$>: W ^ 2 . 

where W = £ Y ({i} ^ {ilJ ' 2}) = {0, 1}*«- and Wi = * ({i} ^ {il}) = {0, 1}^. For the random code de- 
sign, first draw codewords {f3 N (wo) : wq G Wo} i.i.d. according to distribution 1X^=1 pG/o^))* Then, for 

each wq G Wo draw codewords {/3Jy- (too^i) : u>i G Wi} i.i.d. according to Y\ i= iP(yA£)\f3 N (wo,£)), 

( 2 \ f 2 \ 

where f3 N (wo,£) denotes the Ah component of iV-vector f3 N (wq). For the random encoder design, 

choose index a N (x) uniformly at random from those wq G Wo for which (x,(3 n (wq)) G A € , if 

there is no such w , then set a N (x) to 1. Let wq = a N (x); then choose index a N (x) uniformly at 

random from those w\ G W\ for which (x, j3 N (wq, wi),f3 N (wq)) G A e . If there is no such w\, then 

set ar N '{x) to 1. 



By Lemma [T6l below, 



P(y 2 k) < 2 N ^ a ^+ 2 ^ +1 / N )p(y 2 \x) V{x,y 2 ) e A ( e N) 
P(h.te>vJ ^ 2 N ( 8a ^)+i/N) p (y^,y 2 ) \/(x,y v y 2 )eAi N) . 
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Thus 

for all (x,y ,y ) for which (x, y ) G j4e and (x,y ,y ) G A^ . By Lemma [T71 below, 

P((x,y 2 ) 0i^|x) < r 2 " (RH(Xiy!H ' 1(eiHl! +p((iW(x,y 2 )r|x) 
p((x,yi,y a ) 0if)|x,y 2 ) < e - 2 -™— ^» +P((if)(x,y 1) y 2 ))°|x ) y 2 ). 

By Lemma [T"5l above, 

p((if'(I,F 2 )) c ) < 2-^^) 

p((AW(x,yi,y 2 )) c ) < 2-^^) 

for some constants ci(ei) and c 2 (e 2 ) that go to zero as ei and e 2 go to zero. 
Thus when v = 4ai(ei) + 3ei + 8a 2 (e 2 ) and N is sufficiently large, 

P { e N) {v) < E p(x)p(y v y 2 \x) 

< ^p((AW(X,Y 2 )) c \x)p(x) + J2 p(l l \l 2 ,x)p{ ]L2 \x)p{x) 

< E (^^ J( ^ , - a " 1< - 1, - 1) +p((if)(x,y 2 )) c |x)jp( £ ) 

a; 

_ 2 N(« -/(X i y 2 )-2a 1 ( ei )-« 1 ) JVCl(ei) I - 2 JV («l- J ( X:i 'l|i'2)-<l«2<«2)) 

+2 7V(4a l(ei )+3e 1 ) £ p((i (iV) (x> y^ y^c^ y 2 )p(y 2 \x) P (x) 

(x,y 2 ) 

< e _ 2 «(no-f(X;r 2 )-2a l(ei )- ei ) 2 _jv Cl ( ei ) e _ 2 «(fl 1 -i(x ; y 1 |r 2 )-4a 2 ( e2) ) 2 Ar( C2 ( £2 )-4a 1 (e 1 )-3£i) 

Thus for all N sufficiently large, P e (y) can be made to decay exponentially to zero by choosing ei 
such that 2ai(ei)+ei < Rq—I(X; Y 2 ) and e 2 such that 4a 2 (e 2 ) < Ri-I(X;Yi\Y 2 ) and c(e 2 ) > 4ai(ei). 
The resulting exponent decays to zero as ei and e 2 decay to zero. ■ 

Lemmas 1X61 and [TTl below, bound the conditional probability of (Y_ 1 ,Y_ 2 ) given X_ when we emulate the 
broadcast channel with the random code defined in the proof of Theorem [5] 
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Lemma 16 If(x,y) € A\ ', then 

P(y 2 \x) < 2 N ^^+ l l N )p(yJx); 
if, further, (x, y ,y ) <E A ( e ', then 

p(y x h,y 2 ) < 2 iV(8a2(£2)+1/JV) p(y 1 k,y 2 )- 

Proof. The first bound is precisely LemmalLH by the definition of A\ . The proof of the second bound is 
almost identical except in this case codewords are drawn according top(y \y ) . This leads to both the extra 
variable in the condition and the slightly larger exponent in the bound. ■ 

Lemma 17 

p((X,Y 2 ) * if)|x) < e -2^C^ 2 )-™-. 2 ) +pi{A (N) {XjY2)) c k) 

p((X,Yi,Y 2 ) tW)\x, y _ 2 ) < e -2— <— — +P ((A^Hx,Y 1 ,Y 2 )r\x,y_ 2 ) 

Proof. The given code fails to find a jointly typical reproduction {Y_ l ,Y_ 2 ) f° r 2£. if either stage of its 
encoder fails. The first stage fails with probability 

p((iW(x,y 2 )) c |x)<p((iW(x,y 2 )) c |x) + e- 2N( ^- J< - i ' 2, - 2 " 2< - ) - e2, 

by Lemma [74l Otherwise, let y be the first-stage codeword with (x, y ) G A € .If (x, y ) satisihes 
pdA^iX^YiWlx,^) > 2- N ^,thenp((Ai N ^y, y 2 )) c |x, y 2 ) = p{(A{ N \x,Y ll Y 2 )f\x,y_ 2 ) = 
1 by definition of A e . Otherwise (x,y ,y ) A e implies that encoder a N failed to find a jointly 
typical codeword y for (x, y ). Thus 

p((if)((X,Yi,F 2 ))) c |x,y 2 ) < \^p{y^y_ 2 )(l-K&y v y 2 )) 
When K(x, y , y ) = 1, the usuai bounds on the probabilities of typical strings give 

_i_2 _i _2 p(y 2 )p(x,y v y 2 ) - 1 - 2 

Therefore, since (1 - ab) n < 1 - a + e~ bn , 

p((if )(x,y 1) y 2 )) c ^y 2 ) < ( i - 2-^(^i^)+ 4 ^^)) &(i/j£,y 2 )*%, M2 ) 



2" R i 



2 N(H 1 -/(X;V- 1 |Y 2 )-4a 2 (e 2 ) 

jVyy_i\^y_ 2 > I ^^^y.viL 2 ) "■" e 



P((if)(x,y 1 ,y 2 )) c |,,, 2 ) + e - 2 — ™>-^\ 
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Appendix IV 
Multiple Access Channels 

The following definitions, used in the proof of Theorem[6l below, rely on notation defined in Appendix HT1 
Given any p(u,x\,X2,y) = p(u\x\)p{xi, X2)p(y\xi, X2), fix e = (ei, e 2 ) with e±, e 2 > 0. Let 

ai (ei) d = (l + ei)-inf{e'>0:p(/(C^,2Ci) > e' V /(£/) > e') < 2~ JV6£l ViV suff large}. (5) 

a 2 (e 2 ) d ^ f (l + e2)-mi{e'>0:p(f(Y)>e , Vf(U,Y)>e'vf(U,X 1 ,X 2 )>e' 

V/(X 1; X 2 ,y) > e' V /(U^X^X^Y) > e') < 2~ m ^ VN suff. large} (6) 

The typical sets for p(u,x\), p{u,x\,X2,y), and p{x\,X2,y) are 

Af\U,X 1 ) d ^ f {(n, a ) : f^) < e u f(u) < oi(ci), /(«,&) < ai(ei)} 
A( Ar) (l7,X L ,X 2 ,Y') = {(u,Xi,x 2 ,|/) : /(u,^i,£ 2 ) < a 2 (e 2 ),/(«,x 1 ,x 2 ,y) < a 2 (e 2 ), 

/(u,y) < a 2 (e 2 ),/(y) < a 2 (e 2 )} 
^ 7V) (Xi,X 2 ,y) = {{x 1 ,x a ,y) : /(x 1; x 2 ) < e 2 , /(y) < 02(62), f{x x ,x 2 ,y) < a 2 (e 2 )} , 
which we restrict as 

iW(L^) ^ {(n^ 1 )G J 4W:p(((4 7V) (f/^i) c |x 1 )<2- 3 ^} 

AW(tf,Xi,x 2 ,y) = {( Ml ,i 2 ,i)ef :p((4 JV )(c/,x 1 ,x 2 ,y)) c |( M ,x 1 ,x 2 )) < 2 - 3Ar ^} 

AW(X lt X a ,Y) * {(x 1 ,x 2 ,y)G J 4W:p((4 W) (^i^ 2 ,n C |fe^ 2 )) <2- 37V -}. 

Lemma [T8l bounds the probability that i.i.d. samples from p(u,x\,X2,y) are atypical. 

Lemma 18 IfjU^X^Xn.Y) are drawn i.i.d. p(u,xi,X2,y), then 

p((Ai N \u,Xi)) c ) < 2~ Nc ^ 



P{(Ar>(U,Xi,X2,Y)) c U(Ai N \X 1 ,X2,Y)) c ) < 2- Nc °m 

for some ci(ei),c 2 (e 2 ) > and all N sufficiently large. Constants ci(ei) andc 2 (e 2 ) approach as e\ and 
e 2 , respectively, approach 0. 



Proof. Like Lemma[l2\ the result follows Chernoff's bound and the definition of A 
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(AT) 



Proof of Theorem |6l Since .RttM-KJ}) is not bounded from below, we set it to 0. For concision, we 
further define R x = f Rifr-h+VY) and R 2 = f R<&-M-*W) and use C = {X\ x X 2 ,p(y\x 1 ,x 2 ),y) in place 
of C = (A^* 1 ' 1 ) x A'( i2 ' 1 ),p( 2 /^ 1 )|x( il ' 1 \2;( i2 ' 1 )),^ (j ' 1) ) both in this proof and its supporting lemmas. 

Fix (Ri,R 2 ) t° satisfy the theorem constraints. By Theorem |U it suffices to show that for any channel 
input distribution p{x\,x 2 ) there exists a sequence of rate-(i?i, R 2 ) random emulation codes (ajv,/3/v) 
for which the resulting emulation distribution 

v{y\x.i,x 2 ) = Pr(/?jv(ajv(»i,x 2 )) = iv)) 
satisfies 

for some positive function rj(v) dependent on p(x) for which r](u) goes to zero as u goes to zero. 
Fix any p{x\,x 2 ), and then choose p{u\x\) to satisfy the constraints on R\ and R 2 . Let 

p(u,x 1 ,x 2 ,y) = p(u\x 1 )p(x 1 ,x 2 )p(y\xi,x 2 ). 



Recall that (ajv,/3jv) has encoders 



at rates i?i = Rifrh+V}) , R<X*>}-Hj}) = 0, and R 2 = R^M^ii}) an d decoder p N = /3^ } . Rate 
requires no encoder. We abbreviate the notation for the remaining encoders to a N = a^ and 

a N = aft 1 ' 12 . The code also relies on a mapping jjy. Thus the code defines a collection of 

mappings 

ffj : Xi "^ Wi /?}? : WixW 2 ^J 

where Wi = i_ {il} ^°' }) = {0, 1} NR ^ and W 2 = x {{ilM ^ {j}) = {0, 1} NR \ Encoder a$ operates at 
node ii # Encoder a]y is operates at node x Vl using inputs X x and X_ 2 losslessly received from nodes 
i\ and i 2 . The decoder is operated at node j. 

The random code design draws {jn(wi) : w\ G Wi} i.i.d. from the distribution p(u). For each w;i G Wi 
set u = 7at(wi) and then draw {Pn(wi,w 2 ) ■ w 2 £ W 2 } i.i.d. from p(y\u) = Y\i = iPt(y(^)\u(£)). 
For the random encoder design, choose a ( N (x x ) uniformly at random from the indices w\ for which 
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{iN {w± ) , x_i ) G A e .If there is no such w\, then set a N (x x ) to 1. For each (xi,x 2 ), let w\ = a N (x x ) 
and u = Jn{wi), and choose a N {x^^x^) uniformly at random from the indices W2 for which 

{x 1 ,x 2 ,p N {w 1 ,w 2 )) e Ai N \x 1 ,x 2 ,Y) 

(u,x 1 ,x 2 ,P N (w 1 ,w 2 ))eA{ N \u,X 1 ,X 2 ,Y); 

(2) 
if there is no such index, then a N (x 1; x 2 ) = 0. 



By Lemma [191 below, 

for all (a;i,ic 2 ,y) G ^ - By Lemma |20l below, 

p((Ai N \X 1 ,X 2 ,Y)) c \x 1 ,x 2 ) < 5 l + 5 2 + P ((Ai N \U,X 1 )r\x 1 ) 

+2^ 1+4ai(ei)+1/ %((if ) (c/,x 1 ,x 2 ,y)ru(iW(x 1 ,x 2 ,y)r|x 1 ,x 2 ), 

, r- def _ 2 N(R 1 -/(t/;X 1 )- £1 -2 ni ( tl )) dcf _ 2 N(H 2 -/(X 1 ,X 2i y|!7)-4 n2 ( e2 )) pj-^, 

where d\ = e and o 2 = e .By Lemma 1181 above, 

P [(Ai N \u,x 1 ,x 2 ,Y)ru(Ai N \x 1 ,x 2 ,Y)r) < 2 - Nc ^ 

for some constants ci(ei), c 2 (e 2 ) > and all N sufficiently large; constants ci(ei) and c 2 (e 2 ) go to zero 
as e\ and e 2 go to zero. 

Thus when v = 4ai(ei) + 3ei + 802(62) and N is sufficiently large, 

-P e (Ar) 0) < Yl p(^i^2)p(yki,x 2 ) 

< E ^1,^2) (5 l +5 2 +p{(A l f\u,X l )r\x l ) 
+2 N ^+ 4a ^)+y N )p((A^ N \U,X 1 ,X 2 ,Y)) c U(Af\X 1 ,X 2 ,Y)) c \x 1 ,x 2 ] 

< $ 1+ S 2 + 2 - Nc ^) + 2 Ar ( c 2(^)-2e 1 -4a 1 (e 1 )-l/Af) > 

Thus for all N sufficiently large, P e {y) decays exponentially to zero provided that ei is chosen to 
satisfy 2ai(ei) + e\ < R\ — I{U;X\) and e 2 is chosen to satisfy 402(62) < R 2 — I(X\,X 2 ;Y\U) and 
c(e 2 ) > 2ei +4ai(ei). The resulting exponent decays to zero as ei and e 2 decay to zero. 

We next derive the bound on \U\. For any fixed conditional distribution p(xi\u) on an alphabet U 
that is arbitrarily large, we can express the optimization of U as a minimization of the Lagrangian 
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I{X\;U) + uI(Xi,X 2 ; Y\U) over all p(u), u € U, satisfying the constraints p{u) > for all u E ZY, 

YlueuP( u ) = 1' anc * SueW^( w )^( Xl l n ) = p( x i) f° r a ^ b ut one x i e --tin where z^ > is the Lagrangian 
constant. The Lagrangian and the constraints are linear in p(u), so this is a linear program. For every linear 
program, there exists a solution on the boundary of the constrained region. Therefore, given \U\ variables, 
there exists a minimizing distribution p{u) that satisfies \U\ of the given constraints with equality. We 
have one constraint ^2 u€ kP(u) = 1 and \X\ — 1 constraints of form J2u<euP( u )p( x i\ u ) = p( x i)< so at 
least \U\ — \X | constraints of the form p{u) > are met with equality. This implies p{u) > for at most 
\X\ values of u, which gives the desired bound on \U\. ■ 

Lemma 19 For all {u,Xyj ei[ , 

Piuki) < 2 N{ - 4a ^ +2 ^ +l l N ^p{u\x l )- 

if, further, (u,x_i,x 2 ,y) E A e and (x_i,x 2 ,y) E A e , then 

P(|/|u,£i,22) < * N{ ^ {e2)+1/N h{y\u,x x ,x 2 ). 

Thus, for all (x x , x 2 , y) € Ai ' , 

P(y|*i,* 2 ) < 2 N ^^ +2 ^+ 8a ^ +2 ^p(y\x 1 ,x 2 ). 

Proof. The first bound follows immediately from Lemma{T3\ For the second bound, recall that the second 
encoder observes both x_ x and x 2 and looks for a match among codewords drawn according to p(y\u). The 
second bound follows an argument similar to the first, just accounting for these minor differences. Note that 
p(n\x.i) = p{h\3Li,x 2 ) for the given code design. Likewise p{u\x_ij = p(u\xi,x 2 ) since U — > X\ — > X 2 
forms a Markov chain. Note further that each encoder chooses an index if it fails to find a matching 
codeword, and there is no codeword defined for this index; this choice guarantees that source code 's (x x , x 2 ) 
and output y are jointly typical only if both encoders succeed in finding jointly typical codewords - that is, 
if the conditions of the first two inequalities are met. Therefore 

u 

< ^p(M,y|a,a)2 JV(4ai(ei)+2ei+8a2(e2)+2/iV) . 



3 !f J2 ue uP( u ) = 1. Hueu p{v)p{x\\u) = p{xi) for all but one xi £ X\, then J2 u euP( u )P( Xl \ u ) = P( x i) for the 
remaining x\ 6 X\ as well. 
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Lemma 20 For all (» 1; x 2 ) ^ K.i x K.2> 

p((Ai N \X 1 ,X 2 ,Y)) c \x 1 ,x 2 ) < 5 1 +5 2 +p((Ai N \u,X 1 )r\x 1 ) 

+2^ 2ei+4ai ( ei )+ 1 /%((iW([/ j x 1 ,x 2 ,y)) c u(if)(x 1 ,x 2 ,y)) c |x 1 ,x 2 ), 

where b\ = e and o 2 = e 

Pmo/ JfpCC^C^i, ^2,^)) c |fe,^ 2 )) > 2" 3A S thenpdAi^rix,,^) = pdA^rix,,^) = 1 
by the definition of A\ (X\ ,X 2 ,Y) and the bound is satisfied. Otherwise, (x 1 , x 2 , Y_) A € implies that 
one or both of the encoders a N anda N failed to find a matching codeword for (x x ,x 2 ). Encoder a N fails 
if there is no jointly typical codeword for x x in codebook {7jv(1), . . . , 7jv(2 JVi?1 )}. Otherwise, let w\ = 
a 7V (— i) ar >du = 7jv(«Ji). Then encoder a N fails if no codeword in {j3^{w\, 1),... ,/3n(wi,2 NRi )} is 
jointly typical with {u,x x ,x 2 ). Therefore 

p({X x ,X 2l Y) tAWKx^xj = (x u x 2 ] 



< l^Piudil-Kiu,^))] + Yl Pi'M.hi) \y2,P{y\^){^-K{u,x l ,x 2 ,y))\ 

\ u J u--K(u,x 1 )=l \ y J 

By the usual probability bounds for elements of the typical set, 

P(u) > p(n|x 1 )2- Ar ( / ( C7 ' Xl )+ ei+2ai ( ei )) whenK(u,x 1 ) = l 

p(y\u) > p(y\u,x 1 ,x 2 )2- N( - I< - Xl < X2 '< Y W + ' ia2( - e2 » when K^x^x^y) = 1. 
Applying these bounds, the bound (1 — ab) n < 1 — a + e~ bn , and Lemma [I"9l gives 

p ((2Li,2L 2 ,Y) AY\x u x 2 ) = (x 1; x 2 )) 

< 1 — \]p(u\xi)K(u,Xi) + e~ 2 1 ' + /Jif (;u,%i)p(m|«£i) 



2™ R 2 



l-^K(u,x 1 ,x 2 ,y)K(x 1 ,x 2 ,uy)p(y\u,x 1 ,x 2 ) + e 2 x - ,A 

y 

^ II a(N)/TT V \\C\ \ , _ 2 N(R 1 --T((7;X 1 )- E1 -2a 1 ( ei )) _ 2 «(R 2 - /(Xj ,X 2 ;i-| t/)-4a 2 (« 2 )) 

< p{(A\ >{U,Xi)) \x l ) + e +e 

+ 2 N (^+^(^+i/ N )p((AW{U,X 1 ,X 2 ,Y)) c \x 1 ,x 2 ). 



Appendix V 
Interference Channels: Model 1 

The following definitions, used in the proof of Theorem|7j below, rely on notation defined in Appendix Hfl 
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Given any distribution p(«i, 1*2, »i,»2, 2/1,2/2) = p{u2\xi)p(ui\u2,xi)p(xi,X2)p(yi,y2\xi,X2), fix e = 
(ei,e 2 ,e 3 ,e 4 ) with ei,e 2 ,e3,e 4 > 0. Define 

0l (ei) d = (1 + ei) • inf {e' > : p (f(U 2 ) > e' V f(U 2 ,X.i) > e') < 2 _JV6£l ViV suff. large} 

a 2 (e 2 ) d ^ f (1 + e 2 ) • inf |e' > : p (/(t/ 1; U 2 ) > d V /(U^U^X,) > e') < 2^« ViV suff. large} 

a 3 (e 3 ) = (1 + e 3 (t)) ■ hif {e' > : Pr (f(U 2 ) > e' V /(C/ 2 ,r 2 ) > e' V /(^ 2 ,X 1; X 2 ) > eV 

/(t/ 2 ,^i,^ 2 ,Z 2 ) > e') < 2- JV6e 'W ViV suff. large} 
a 4 (e 4 ) = f (1 + e 4 (t)) ■ mf{e' > : Vr{f {U^U^Y^ > e' V /(t^, C/ 2 ,ri,r 2 ) > e' V 

/(&, ^2,^1,^2,^2) > e' V /(f/i,^ 2 ,^i,^ 2 ,i:i,Z 2 ) > e' V /(F^Z,) > e' V 

/(^i,^ 2 ,Zi,Z 2 ) > < 2~ me ^ ViV suff. large}. 
The corresponding typical sets are 

A^{U 2 ,X{) d ^ f {(u 2 ,x 1 ):f(x 1 )<e 1 ,f(u 2 ),f(u 2 ,x 1 )<a 1 (e 1 )} 
A[ N \Ui,U 2 ,X 1 ) = {(«i,M 2 ,^i) : /(M 2 ),/(«i,^ 2 ),/fe,M 2 ),/(Mi,M 2 ,^i) < 02(62)} 

^(^2, Xi,X 2 ,y 2 ) = f {(tfe.^l.Sj.l/a) : /(«2)./(«2.|/2)./(W2.£l.S2). 

/(« 2 ,£i,£ 2 ,2/ 2 ) < 03(63)} 

A( JV )(c/i,c/2,Xi,x 2 ,yi,i2) d = |(mi,:w 2 ,zi,z 2 , 2^,2/2) : f(ui,u2,y 2 ), f{ui,u2>y v y 2 ), 

f(ui,U2,x 1 ,x 2 ,y 2 )J(u 1 ,u 2 ,x 1 ,x 2 ,y v y 2 ) < a 4 (e 4 )} 
^(X!,^,^,^) d ^ f {fe,^,^,^) : /(xx.xa) < e 4 (t), 

f{y v y 2 )J{x.i,x.2iy v y 2 ) < 04(64)} , 

which we restrict as 

i^^^i) d = f {(n 2 ,x 1 )€4 iV) :Pr((Af)(^ 2 ,X 1 )) c |x 1 )<2- 3 ^} 

if)^!,^,^) d ^ f {( fil ,« a ,x 1 )e^:Pr((4 w )(«7 1> ^,X 1 )) c |« 2 ,x 1 ) <2" 3 ^} 

if)^,^,^,^) ^ {(s fa> x 1> x 2> y 2 )€^:Pr((^r|ti 2> x 1 ,x 2 ) <2' 3N ^} 

Af\u 1 ,U 2 ,X 1 ,X 2 ,Y 1 ,Y 2 ) = {(Mi,M 2 ,x 1 ,x 2 ,y 1 ,y 2 )G4 Ar ):Pr((4 Ar )) c K,u 2 ,x 1 ,x 2 ,y 2 

< 2 -3Are 4 (t)l 

if^^y,^) d ^ f {OEx.a^.^e^^^riai.a) <2" 3 ^)}. 



44 



Lemma 21 If(U 1 , Uq,X 1 ,X 2 ,Y 1 ,Yq) are drawn i.i.d. p(u\,v,2, x\,X2, 2/1,2/2). then there exist positive 
constants ci(ei), 02(62), 03(63), and 04(64) for which 

Pr((iW([/2,^l)) c ) < 2-^^) 

Pr((i( Ar )(f/i,Z7 2 ,X 1 )) c ') < 2-^^) 

Pr((i( Af )(t/ 2 ,X 1 ,X 2 ,y 2 )) c ) < 2-^^) 

Pr((iW(^ 1 ,t/ 2 ,x 1 ,x 2 ,y 1 ,y 2 )) c u(iW(x 1 ,x 2 ,y 1 ,y 2 )) c ) < 2 - Nc ^ 

for all N sufficiently large. Constant Cfc(e, t) approaches as 6fc(t) decays to 0. 
Proof. Like Lemma\l^ the result follows from Chernoff's bound and the dehniton of A e . ■ 

Proof of Theorem S We set the rates fl^-HM), fl({*»}-H*}) t u({fe}-»-{Ja}), u({<»}-Kj'i^}), and 
j£U>i,*2}-Hi2}) f or w hich no bounds are given to zero, simplify remaining notation as R\\ = .Ru* 1 } - *"^}), 
R 12 d ^ f ij({^{i.*}), i? 21 d ^ f fl ({iiA}-H*}) f and it> 22 d ^ f #*Hfc*}) and use C = (*i x 
<*2, P(yi,y2ki,x 2 ), ^1X^2) instead of (X^^ xX^ l \p(y^ l \y(J^\x^' 1 \x^'^),y^ 1 '^xy^ 2 '^) 
in this proof and its supporting lemmas. 

Fix (Rn, Rn, R21, R22) to satisfy the theorem constraints. Letp(xi, ^2) be arbitrary, and choose p(u 2 |xi) 
and p(ui|xi,u 2 ) to satisfy the given bounds. Let 

p(ui,U2,xi,x 2 ,yi,y2) = p(u2\xi)p(ui\xi,u 2 )p(xi,x2)p(yi,y2\xi,x2). 
We define corresponding (restricted) typical sets in Appendix [V] 

Excluding the rate-0 codes, four encoders and two decoders are required. We simplify their notation as 
„(n) _,({*i}-Kii}) -,(21) „({*iM-KJi}) fl (i) flO'O 

AT — N N — N PN — PN 



(12) _ „({ii}->{iij a }) (22) _ r{ii,ia}"Kjirf»}) «(2) _ R (h) 



where 



u { n ] ■ Xi -* W11 a^ 1} : ^ x * 2 ->■ W21 $J } : Wu x W12 x W21 x W 22 ->• 3^ 

a£ 2) : #1 -> W12 a% 2) : X x x ^ ->■ W 22 /3^ } : W i2 x W 22 -»• ^ 2 

and 

Wu = ^ ({4l} ^ {jl}) = {0,l}^" VW12 = A? ({il} ^ {il ' i2}) = {0, 1}**« 

W21 = x {{llM ^ {jl}) = {0,1} NR ^ W22 = ^^ } ^ {il ' i2}) = {0,l}^- 

Encoder (a N , a]y ) operates at node i\, transmitting its rate Rn and i?i 2 descriptions to node jf and 
both nodes, respectively. Encoder (a N ,a N ) operates at node v Vl , receiving noiseless descriptions of 
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x ± and x 2 from nodes i\ and i 2 and transmitting its rate R21 output to node j\ and its R22 to both nodes. 
The code also employs mappings 7^ : Wn x W12 — ► U\ and 7}/ : W12 — > U2 

The random code design draws codewords {t n (^12) : ^12 G VV12} i.i-d. from distribution JTfci Pfe(^))- 
For each W12 G j) 12 , let U 2 = T N (W12) and draw codewords {7^ (1171,1172) : Wu G Wn} i.i.d. from 
n£iK«i(*)|£2(*)) and codewords {/3{, 2) (^12,^22) : ^22 G W 22 } U.d. from Il^iP^WI^W)- 
Finally, for each (1171,^12,^22) G j) u x y^ x ^ 22 , let 

(^1,^2)^2) = (7^ ) Wn,^i2),7^ ) (^i2),/3^ ) (^i2,^22)), 

and draw {^(wi 1,^12,^21,^22) : 1021 G W21} i.i.d. from UtMl^MlW&W^ii)). For the 
encoder design, choose a N (2^) uniformly at random from those U72 € X_ 12 f° r which (7^) (W12), 2a) € 
A e ; if there is no such w\2, then set a N (x x ) to 0. Let W12 be the chosen index and U_ 2 = 7^(^12)- 
Choose a N (27) uniformly at random from the set of w\\ G X_u for which (7^(1011, W12), IZ2 >•£].) £ 
A e ; if there is no such wn, then set ajy (x 1 ) to 0. Let wn be the chosen index and U_ l = 

(1) - (22) 

j*t (wiijWiz). Then choose a N (xi,x 2 ) uniformly at random from the set of W22 G ^ 22 for which 

(^2'^i'3i2 5/^7v ( u ' 12 ' u '22)) G j4e ; if this set is empty, set a N (x l5 x 2 ) = 0. Then let ^22 be the 

(2) (21) 

chosen index and Y_ 2 = @n { w i2, W22), and choose a N (x x ,x 2 ) uniformly at random from the set of 

«)2i 6 X_2\ f° r which 

(^1,^2>^1>^2>/ 3(1) ( W 11> W 12,^21,^22),112) ^ W 5 
(21) 

if this set is empty, set ajy (x 1); r 2 ) to 0. 

By Lemma [22] below, 

P(y v y 2 ki,x 2 ) < 2 N ^t^ k (e k )+4/N) y^x^yj G Ai N \ 

where fei(ei) = ei + 2ai(ei), and 6fc(efc) = 4afc(efc) for A; G {2,3,4}. By Lemma l23l below, 

< 6u+6 12 + S 21 +6 2 2 +p((Ai N \U2,X 1 )r\x 1 ) + 2 N ^^ +1 / N ^p((Ai N \U 1 ,U2,X 1 )r\x 1 ) 

+2 N{2n=^)+m) p{{ Af){U2,x 1 ,X2,Y 2 )T) 

+ 2 N ( 2 n^^Mm) p ^AW(U 1 ,U2,X 1 ,X 2 ,Y 1 ,Y 2 )ruAi N \x 1 ,X 2 ,Y 1 ,Y2)) c ), 



where 



_ 2 N(R 11 -I{X 1 ;U 1 \U 2 )-b 2 {i 2 )) _ 2 N{R 21 -I(X 1 ,X 2 ;Y 1 \U 1 ,U 2 ,Y 2 )-b i {i i )) 

dn = e 5 2 i = e 

_ 2 N(R 12 -I{X 1 ;U 2 )-b 1 ( ll )) _ 2 N(R 22 -I(X 1 ,X 2 ,Y 2 \U 2 )-b 3 ( e3 )) 

012 = e 022 = e 
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By Lemma 121] above, 

p{{A{ N \u 2 ,X 1 ))^ < 2-*^) 

p([Af\U Xl U 2 ,X x )) c ) < 2- Nc ^ 

P ((AW(u 2 ,x 1 ,x 2 ,Y 2 )y) < 2- Nc ^ 



,,[( A,''(U 1 ,U 2 ,X 1 ,X 2 ,Y 1 ,Y 2 )) c U(Af\x x ,X 2 ,Y x ,Y 2 )) c J < 2~ Nc ^ 
for all N sufficiently large, where each Cfc(efc) approaches as e^ approaches 0. So, if v = 3 Ylk=l frfc(efc), 
P { e N) {v) < S n + S 12 + 5 21 + 6 22 + 2 - Nc ^ + 2-"M*)- 2 M*>-i/JV) 

for iV sufficiently large. Thus sequentially choosing £4, £3, e 2 , and ei to satisfy 

b A {e 4 ) < R 21 -I{X 1 ,X 2 ;Y 1 \U 1 ,U 2 ,Y 2 ) 

b 3 (e 3 ) < mm{R 22 -I(X 1 ,X 2 ;Y 2 \U 2 ),c^)/6} 

b 2 (e 2 ) < mm{R 11 -I(X 1 ;U 1 \U 2 ),c^)/6,c 3 (e 3 )/4:} 

61 (ei) < nun{fli2-J(-yi;E/ 2 ),C4(€4)/6 > C3(e3)/4,p 2 (c 2 )/2} 

yields an error probability P e (1/) that decays exponentially to zero. The exponent approaches as e\, 
e 2 , €3, and 64 approach 0, which gives the desired result by Theorem [4] ■ 

Lemmas [22] and [23] bound the emulation distribution and the conditional probability of observing atypical 
strings using the code defined in the proof of Theorem [7] 

Lemma 22 For all (u 2 , % x ) e A[ , 

p(M 2 |Si) < 2 N ^ a ^ +2 ^ +1 / N ^p(u 1 \x i y, 
if, in addition, (ui,u 2 ,x_i) E A e , then 

p(«ll«2,£l) < 2 iV(8a2(£2)+1/iV) p(Milli2^i)- 
if, further, (u 2 ,x_i,x 2 ,y) £ A e , then 

V{y 2 \u 2 ,x x ,x 2 ) < 2 N ( 8a ^ +1 / N ^p(y 2 \u 2 ,x 1 ,x 2 ). 
if, also, {u x ,y&,x x ,X2,y v yJ G ^ and and (x l5 x 2 ,y i; y 2 ) G i^ , then 

P^k,^,^,^,^) < 2 iV ^^) +1 /%(y i |n 1 ,n 2 ,x 1 ,x 2 ,y 2 )- 
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ForaIZ(z 1 ,X2,y 1 ,|/ 2 ) € ^ > 



i%2) 



Proof. Applying Lemma [L3l as in Lemmas [76l and [79l gives the first four bounds. We then apply the 
Markov structure imposed on p(-) by the code design and the Markovity of the underlying distribution 

p(ui,U2,Xi,X2,yi,y 2 ) = p(xi, x 2 )p(ui, u 2 \xi)p(yi, y 2 \xi, x 2 ) 
to obtain 

■ 

Lemma 23 Let &i (ei) = t\ + 2ai(ei) andbk(ek) = 4a,k(ek) fork = 2,3. Then 

p((iW(x 1 ,x 2 ,y 1 ,F 2 )) c |x 1 ,x 2 ) 

< <5 U + <5 12 + «5 21 + «5 22 +p((if)(C/ 2 ,X 1 )) c |x 1 ) + 2 N ( 2b ^ +1 / N ^p((Ai N Hu 1 ,U 2 ,X 1 )r\x 1 ) 

+2^«-^)+ 2 /%((4 w )(i7 a) jr 1> jr a ,y 2 )) c |x 1 ,x 2 ) 

where 

r _2 N ( R n- J "( x i; c/ il !1/ 2)-f'2<«2)) r _2 JV («i2- J ( x ii^2)-i>i(«i)) 

On = e <3i2 = e 

_2JV(H 21 -i-(X 1 ,X 2i y 1 |Lf 1 ,[/ 2 ,V2)-64( e4 )) _9 iV < 7? 22 --f C-*l ,*2 ;V 2 I U 2 ) -63 ( £ 3» 

o 2 i = e o 22 = e 

Proof. For notational brevity, let 

Ki = K(u 2 ,x l ) K 3 = Kiu^x^x^y^) 

K 2 = K(u 1 ,u a ,x 1 ) K 4 = K{u l ,u 2 ,x 1 ,x 2 ,y v y 3 ) ■ K^x^y^y^); 
we rely on context to specify the values of arguments. {x 1: x 2 ,y , y ) not jointly typical implies that one of 
the four encoders failed to find a jointly typical codeword. We bound the probability of such a failure for 
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each encoder in turn and then apply Lemma\2%to boundp(-), giving 



2) 
2 NR i2 , , 2 NRl1 



p((Ai N \x 1 ,X 2 ,Y 1 ,Y 2 )y\x 1 ,x 

< f^P(M 2 )(l-^l) ) +E^^2)(EK«ll« 2 )(l-^2) 
\ "a / M 2 \ Mi 

(> 2««22 
^ / 

2 N «21 

+ ^2 K 1 K 2 K 3 p(u 1 ,u 2 ,y 2 ) [^2p(y 1 \u 1 ,u 2 ,y 2 )0--K 4: ) 

«i,«2.y 2 \ v_ x 

< p((if)(C/ 2 ,X 1 )) C |x 1 ) +e -2-^-^^)"H(n)) 

+2 ^(2 &1 (e0+W) p(( i(iV) ([/i)[/2)Xi))C | a) + e -2«(« 11 -^^ ll -2,-2(,2,, 

_2 N ( R 21-J(^1.^2ii'l|C/l,U2,V2)-64(E4)) 

+e 



Appendix VI 
Interference Channels: Model 2 



The following definitions, used in the proof of Theorem[8l below, rely on notation defined in Appendix ITT1 
Given any distribution p(ui,u 2 ,xi,x 2 ,yi,y 2 ) = p(ui\xi)p(u 2 \ui, x\)p(xi, x 2 )p(yi, y 2 \xi, x 2 ), fix e = 
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(ei,e 2 ,e3,e 4 ) with ei,e 2 ,e 3 ,e 4 > 0. Fix e = (ei, e 2 , e 3 , e 4 ) with e^ > for all k. Let 

oi(ei) d = (1 + ei) • inf{e' > : Pr(/(C/ 1 ) > e' V /(^i,2Ci) > < 2 _7V6ei ViVsuff. large} 
a 2 (e 2 ) d ^ f (1 + e 2 ) • inf{e' > : Pr(/(t^) > e' V /(t^, X^ > e' V /(C/ l5 C/ 2 ) > e' 

V/GZi^Xi) > e') < 2~ JV6e(t) ViV suff. large} 
a 3 (e 3 ) = (1 + e 3 (t)) ■ inf{e' > : PrC/^) > e' V /(t/^Zi) > e' V /(t^Xi,^) > e' V 

/(&,Xi,X2>Hi) > e') < 2~ Af6e3(t) ViV suff. large} 
a 4 (e 4 ) = (1 + e 4 (t)) ■ inf{e' > : PrCfC^t/a,!:!) > e' V /(t^, C/ 2 ,y 1; y 2 ) > e' V 

f{Ui,U 2,^X2^) > e' V /(^x.^Xi.Xa.Zi,!^) > e' V /(y^) > d V 
/(Xi,X 2 ,Zi,Z 2 ) > e') < 2~ N6e ^ ViV suff. large} . 
The typical sets are defined as 

Af\U,X 1 ) ^ {(u 1 ,x 1 ):f(x 1 )<e 1 ,f(n 1 ),f(n 1 ,x 1 )<a 1 (e 1 )} 
A{ N \Ui,U 2 ,X) = {(u^Uz,^) : /(Mi),/(«i,«2))/(Mi,»i),/(«i,«2,^i) < a 2 (e 2 )} 
^ iV) (t r i,Xi,X 2 ,yi) = {(jXi,^,^,^) : f{u{)J{u 1 ,y^),f{u 1 ,x 1 ,X2), 

/(^i,^!,^,^) < a 3 (e 3 )} 

^ Ar) (j7i,c/ 2 ,Xi,x 2 ,Yi,y 2 ) = {(ui,M 2 >^i>^2>y 1 >y 2 ) : f(.ni,u.2^y 1 )J(ui^2,y v y 2 ), 

f{ui,u 2 ,x l ,Xj 2 ,y 1 )J{u l ,u 2 ,x l ,Xj 2 ,y_ v y J2 ) < a 4 (e 4 )} 
^(Xi.Xa,^,^) = {(x 1 ,x 2 ,y 1 ,y 2 ):/(x 1 ,x 2 )<e 4 (0,/(y 1 ,y 2 ), 

f(xi,X2,y v y2) < a 4 (e 4 )}, 
which we restrict as 

if^X) d = f {(«!,*!) € Af) : Pr ((^f)(t/ 1 ,X 1 )) c |x 1 ) < 2~ 3 ^} 
if^C^X) d ^ f {(tt 1 ,« a ,x 1 )€^:Pr((4 w )(«7 1> i7 2 ,X 1 )) c |a 1 ,x 1 ) <2~ 3 ^}. 

< 2 -3JVes(t)| 

if)(l7i,^ 2 ,Xi,X 2 ,yi,y 2 ) = {(Mi,M 2 ,x 1 ,x 2 ,y 1 ,y 2 )eAf):Pr((4 JV )) c K,u 2 ,x 1 ,x 2 ,y 1 ^ 

< 2 -3Are 4 (t)l 

if)^,^,^,^) d ^ f {(x^^^^^G^^Pr^r^,^) <2" 3 ^)} 
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Lemma l24l bounds the probability of observing elements outside of those typical sets. We omit the proof, 
which follows the same outline as the corresponding examples in prior sections. 

Lemma 24 If (U.i,LL.2> 2Li, 2£.2iX.i,Y_ 2 ) are drawn U.d. p(ui,U2,xi,X2,yi,y 2 ), then there exist positive 
constants ci(ei) and 02(62) for which 

Pr f(Af\U 1 ,X 1 )) c '\ < 2- Nci ^ 

Pr((AW(t/i,l72,.Xi)) c ) < 2~ Nc *^ 

Pr((Ai N Hu 1 ,x 1 ,x 2 ,Y 1 )y) < 2 - Nc ^ 

Pr((Al N \u 1 ,U 2 ,X 1 ,X 2 ,Y 1 ,Y 2 )) c U(AW(X 1 ,X 2 ,Y 1 ,Y 2 ))^ < 2~ Nc ^ 
for all N sufficiently large. Constant c&(e, t) approaches as €&(£) approaches 0. ■ 

Proof of Theorem [8} All rates not bounded in the theorem statement are set to zero. We simplify 
the remaining notation as flu d = fltt^WO'iJa}), R 12 d ^ f fl({<i}-KM), fl 21 d = Ri^M^liuh})^ and 
R 22 d ^ R({iiM^{J2}) _ We use C = (Xi X A , 2,p(yi,y 2 |xi,x 2 ),^i X 37 2 ) in place of the formal channel 

lemmas. 

Fix (Ru, R12, R21, R22) to satisfy the theorem constraints. Letp(x\,x 2 ) be arbitrary, and choose p(ui\xi) 
and p(u 2 \ui,xi) to satisfy the given bounds. Let 

p(ui,u 2 ,xi,x 2 ,yi,y 2 ) = p(ui\xi)p(u 2 \xi,ui)p(xi,x 2 )p(yi,y 2 \xi,x 2 ). 

We apply the typical set definitions given above. 

Excluding the rate-0 codes, four encoders and two decoders are required. We simplify their notation as 

,,(11) _ „({ii}-KiiJa}) J 21 ) _ „({*i.*i}-*{j'iJ'»}) fl(l) _ ptii) 



(12) ({M^iM) (22) „({ii,fe}-KM) fl(2) ^2) 



where 



a (21) 
"AT 


= 


a Af 


,*2}->{jlJ 


2}) 




/3 (1) 

Pat 


a (22) 

"at 


= 


«7V 


M-KM) 






/3 (2) 

Pat 


*w 


W21 




Pn ■ 


Wn 


x W21 


"►Zi 



«£ 2) : *l ->■ W i2 c$ 2) :i^2^ W22 pi? : Wu x W i2 x W 21 x W 22 -»• 3^ 



and 

^({*l,«2}^{jl}) r„ -WD U! vtt*!'* 2 }^^!'^}) 



Wu = ^ ({il} ^' l}) ={0,l}^" W12 = l ({il} ^ {jI ^ }) = {0,l}^ 



W21 = ^'^^={0,1}^ W22 = X^' l2t ^ 3l ' mj = {0,l} NR * 
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Encoder (a N ,a N ) operates at node i\, transmitting its rate R\\ and R\2 descriptions to both nodes 
and only j'2, respectively. Encoder (a N ,a N ) operates at node v Vl , receiving noiseless descriptions of 
x_i and x 2 from nodes i\ and 12 and transmitting its rate i?2i output to both nodes and its R22 output to 
only J2- The code also employs mappings 7^ : Wn —> U\ and 7^ : W11 X W12 — > 1^2- 

The random code design draws {~f N (wn) : w G Wn} i.i.d. from the distribution 11^=1 KitL CO)- F° r 
each wn G ^ xl , let C/ 1 = 7jy- (itfn) and draw codewords {7^ (u>n,u>i2) : W12 G W12} i-i-d. from 
n.tiP(lh(Z)\Ul(t)) and codewords {P$(w n ,w 21 ) : W21 G W 2 i} i.i.d. from l\ti Pil^Mi^)) ■ 
Finally, for each (^11,^12,^21) G j^j x y_ 12 x 3^ 2 i > let 

GZl,*Z 2 >Hl) = (7^ (Wll)j 7^(^11, W 12 ),P^ ) (w 1 l,W2l)), 

anddmw{{f3% ) (w 11 ,w 1 2,W2i,w 2 2) : ^22 G W 22 } i-i-d. from n^ lP (y 2 (*) I &(4 ^2 (*), Hi (*))• Choose 
a Ar fei) uniformly at random from the indices wu G X_ u for which (j^ 1 ' (wu) , x_i) £^1 > if there is 
no such index, then set a N (a^) to 1. Let wu be the chosen index and U^ = 7W (ion). Choose a N (x{) 
uniformly at random from the indices wi 2 G X_yi f° r which (V_i, 7 ™(ttfii,itfi2),21i) G A e ; if there is 

no such index 1172, then set a N Qzjj to 1. Let 1012 be the chosen index, and let U_ 2 = In (wn,wi2). 

(21) 
Choose a N (xi,x 2 ) uniformly at random from the set of W21 G X_ 2 \ f° r which 

(U 1 ,x^> 1 \x^ 1 \4\wn,W2m)) G if); 

if this set is empty, then a N (x 1 ,x 2 ) is set to 0. Let W21 be the chosen index and set Y_ x = f3 N (wii,W2i); 

(22) 
choose a N (x 1 ,x 2 ) uniformly at random from the set of W22 G X_ 2 2 f° r which 

(^l,^2^1^2'Hl)/ 3(2) ( u; ll) W 12,^21,W22)) 
(22) 

is typical; if this set is empty, then a N (xi,x 2 ) i s set to 0- 
For all (x_i,x_2,y 1 ,y 2 ) G A^ , Lemma l25l below, 

p(y v yJx 1 ,x 2 )<2 N ^t--^^ N \ 

where 61 = 2oi(ei) + e\ and b^ = 4afc(efc), k G {2,3,4}. By Lemma [26] below, 

p(Ai N \(X 1 ,X 2 ,Y 1 ,Y 2 )) c \x 1 ,x 2 ) 

< 6 11 +5 1 2+S2 1 +522+p((Ai N Hu u X 1 )y\x 1 ) + 2 N ( 2b ^ +1 / N ^p((AW 

+ 2 N (^^ b ^^/ N )p((Ai N Hu 1 ,X 1 ,X2,Y 1 )r\x 1 ,X2) 
+2^ 2 ^ 1 ^(^)+3/iv) K( i(iV) ([/i)t/2)Xi)X2)yi)y2))C | £i)£2)) 
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where 

_2»(H 11 -I(X 1 ;!7 1 )-b 1 (« 1 )) _2 N ( R 12- I (X 1 :U 2 lU 1 )^b2(e 2 y) 

On = e 0i2 = e 

_2N(ii 21 -i-(X 1 ,X 2 ;i' 1 |!7 1 )-b3(«3)) _ 2 JV(H 22 -I(X 1 ,X 2 ,Y 2 | [/j ,U 2 ,Y ± ) -6 4 (« 4 )) 

021 = e 022 = e 
Lemma l24l above, gives 

p((^(Z7i,Xi)) c ) < 2- Yci < ei > 



p^^-'(^i,[/ 2 ,Xi)) c j < 2^ C2 ^) 

P [(A ( e N Hu 1 ,U 2 ,X 1 ,X 2 ,Y 1 ,Y 2 )ru(Ai N \x 1 ,X 2 ,Y 1 ,Y 2 )) c ) < 2~ Nc ^ 

for all N sufficiently large, where each Cfc(e,t) approaches as £&(£) approaches 0. Thus setting v = 
3 Et=iM e fc) § ives 

P { e N \v) < 8 ll + 8 l2 + 5 2l + 5 22 + 2- Nc ^+2- N ( c ^- 2b ^- l l N ^ 

+ 2-N(c 3 (e :i )-2Y: 2 k=1 b k (e k )-2/N) , 2 -Ar(c 4 (<E4)-2 ^Li 6fc(e*=)+3/Ar) 

for iV sufficiently large. Thus sequentially choosing €4, €3, e 2 , and ei to satisfy 
6 4 (e 4 ) < i?22-/(Xi,X 2 ;F 2 |t/ 1 ,C/ 2 ,y 1 ) 
6 3 (e 3 ) < min{i? 21 -7(X 1 ,X 2 ;y 1 |t/i),C4(e 4 )/6} 
6 2 (e 2 ) < mm{R 12 -I(X 1 ;U 2 \U 1 ),c 4 (e A )/6,c 3 (e 3 )/4} 
61 (ei) < mm{^ii-/(Xi;C/i),C4(e 4 )/6, C 3(e 3 )/4,C2(e2)/2} 

yields an error probability P e ' (y) that decays exponentially to zero. The exponent approaches as e\, 
e 2 , e 3 , and 64 approach 0, which gives the desired result by Theorem [4] ■ 



UN) 



Lemma 25 For all (u x , x_i ) G ^ 

p(«il*i) < 2 iY ( 4ai ^ 1 )+ 2 ^+ 1 /^) p (y i |^ i ) ; 
if, further, (ui , u 2 , x x ) G A\ then 

fel«i,2a) < 2 Ar ^^)+ 1 ^)p( M2 | Ml ,x 1 ); 
if, in addition, (m 1 , x x , x 2 , y ) G Ae 

P(^|!h,£i,22) < a^^+^^lui,*!,^)- 
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if also (ui,u 2 ,x 1 ,x 2 ,y 1 ,y 2 ) G A ( € ', then 

]%fe, 2^1^2,2/!) < 2 N( - 8a ^+ 1 Wp(y 2 \u x ,u 2 , x l5 x 2 , yj . 
Thus, if (x x , x 2 ,y v y 2 ) G i^, 

Proof. The proof follows the same outline as the preceding examples. ■ 

Lemma [26] bounds the probability of observing atypical strings using the code designed in Theorem [8] 
Lemma 26 Let &i(ei) = 4oi(ei) + 2ei + 1/iV andb k (e k ) = 8a k (e k ) + 1/JV, fe G {2,3}. Then 

+ 2 N ^^ b ^p((A( N \u 1 ,X l ,X 2 ,Y 1 )r\x 1 ,x 2 ) 

+ 2 N ^l^ b ^p((Ai N \u 1 ,U 2 ,X 1 ,X 2 ,Y l ,Y 2 )r\x 1 ,x 2 ). 
where 

r _ 2 «(Hn-/(X li [/ 1 )-b 1 (« 1 )) - _2 JV («12-J(XliU 2 |l/l)-6 2 C«2)) 

On = e 019 = e 



ni = e oi 2 

r _ 2 N(H 21 -i-(X 1 ,X 2; y 1 |!7 1 )-63( e 3)) „ 

?21 = e 22 



r _2«(«21--f(^l.X 2 ;V-i|t/i)-6 3 («3)) - _2«(«22-J"(^l.X 2 ;l'2l^l.t/2.i'l)-t4<«4)) 

021 = e 022 = e 
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